wasm-demo/demo/ermis-f/python_m/cur/1223

90 lines
2.4 KiB
Plaintext

From: skip at mojam.com (Skip Montanaro)
Date: Fri, 23 Apr 1999 22:41:19 GMT
Subject: Python too slow for real world
References: <372068E6.16A4A90@icrf.icnet.uk> <3720A21B.9C62DDB9@icrf.icnet.uk>
Message-ID: <3720F783.24F2E94B@mojam.com>
Content-Length: 2124
X-UID: 1223
Arne Mueller wrote:
> However the problem of reading/writing larges files line by
> line is the source of slowing down the whole process.
>
> def rw(input, output):
> while 1:
> line = input.readline()
> if not line: break
> output.write(line)
>
> f = open('very_large_file','r')
> rw(f, stdout)
>
> The file I read in contains 2053927 lines and it takes 382 sec to
> read/write it where perl does it in 15 sec.
I saw a mention of using readlines with a buffer size to get the
benefits of large reads without requiring that you read the entire file
into memory at once. Here's a concrete example. I use this idiom
(while loop over readlines() and a nested for loop processing each line)
all the time for processing large files that I don't need to have in
memory all at once.
The input file, /tmp/words2, was generated from /usr/dict/words:
sed -e 's/\(.*\)/\1 \1 \1 \1 \1/' < /usr/dict/words > /tmp/words
cat /tmp/words /tmp/words /tmp/words /tmp/words /tmp/words >
/tmp/words2
It's not as big as your input file (10.2MB, 227k lines), but still big
enough to measure differences. The script below prints (on the second
of two runs to make sure the file is in memory)
68.9596179724
7.96663999557
suggesting about a 8x speedup between your original function and my
readlines version. It's still not going to be as fast as Perl, but it's
probably close enough that some other bottleneck will probably pop up
now...
import sys, time
def rw(input, output):
while 1:
line = input.readline()
if not line: break
output.write(line)
f = open('/tmp/words2','r')
devnull = open('/dev/null','w')
t = time.time()
rw(f, devnull)
print time.time() - t
def rw2(input, output):
lines = input.readlines(100000)
while lines:
output.writelines(lines)
lines = input.readlines(100000)
f = open('/tmp/words2','r')
t = time.time()
rw2(f, devnull)
print time.time() - t
Cheers,
--
Skip Montanaro | Mojam: "Uniting the World of Music"
http://www.mojam.com/
skip at mojam.com | Musi-Cal: http://www.musi-cal.com/
518-372-5583