90 lines
2.4 KiB
Plaintext
90 lines
2.4 KiB
Plaintext
From: skip at mojam.com (Skip Montanaro)
|
|
Date: Fri, 23 Apr 1999 22:41:19 GMT
|
|
Subject: Python too slow for real world
|
|
References: <372068E6.16A4A90@icrf.icnet.uk> <3720A21B.9C62DDB9@icrf.icnet.uk>
|
|
Message-ID: <3720F783.24F2E94B@mojam.com>
|
|
Content-Length: 2124
|
|
X-UID: 1223
|
|
|
|
Arne Mueller wrote:
|
|
> However the problem of reading/writing larges files line by
|
|
> line is the source of slowing down the whole process.
|
|
>
|
|
> def rw(input, output):
|
|
> while 1:
|
|
> line = input.readline()
|
|
> if not line: break
|
|
> output.write(line)
|
|
>
|
|
> f = open('very_large_file','r')
|
|
> rw(f, stdout)
|
|
>
|
|
> The file I read in contains 2053927 lines and it takes 382 sec to
|
|
> read/write it where perl does it in 15 sec.
|
|
|
|
I saw a mention of using readlines with a buffer size to get the
|
|
benefits of large reads without requiring that you read the entire file
|
|
into memory at once. Here's a concrete example. I use this idiom
|
|
(while loop over readlines() and a nested for loop processing each line)
|
|
all the time for processing large files that I don't need to have in
|
|
memory all at once.
|
|
|
|
The input file, /tmp/words2, was generated from /usr/dict/words:
|
|
|
|
sed -e 's/\(.*\)/\1 \1 \1 \1 \1/' < /usr/dict/words > /tmp/words
|
|
cat /tmp/words /tmp/words /tmp/words /tmp/words /tmp/words >
|
|
/tmp/words2
|
|
|
|
It's not as big as your input file (10.2MB, 227k lines), but still big
|
|
enough to measure differences. The script below prints (on the second
|
|
of two runs to make sure the file is in memory)
|
|
|
|
68.9596179724
|
|
7.96663999557
|
|
|
|
suggesting about a 8x speedup between your original function and my
|
|
readlines version. It's still not going to be as fast as Perl, but it's
|
|
probably close enough that some other bottleneck will probably pop up
|
|
now...
|
|
|
|
import sys, time
|
|
|
|
def rw(input, output):
|
|
while 1:
|
|
line = input.readline()
|
|
if not line: break
|
|
output.write(line)
|
|
|
|
f = open('/tmp/words2','r')
|
|
devnull = open('/dev/null','w')
|
|
|
|
t = time.time()
|
|
rw(f, devnull)
|
|
print time.time() - t
|
|
|
|
def rw2(input, output):
|
|
lines = input.readlines(100000)
|
|
while lines:
|
|
output.writelines(lines)
|
|
lines = input.readlines(100000)
|
|
|
|
f = open('/tmp/words2','r')
|
|
|
|
t = time.time()
|
|
rw2(f, devnull)
|
|
print time.time() - t
|
|
|
|
|
|
|
|
Cheers,
|
|
|
|
--
|
|
Skip Montanaro | Mojam: "Uniting the World of Music"
|
|
http://www.mojam.com/
|
|
skip at mojam.com | Musi-Cal: http://www.musi-cal.com/
|
|
518-372-5583
|
|
|
|
|
|
|
|
|