109 lines
3.4 KiB
Plaintext
109 lines
3.4 KiB
Plaintext
|
From: sdm7g at Virginia.EDU (Steven D. Majewski)
|
||
|
Date: Fri, 23 Apr 1999 14:26:41 -0400 (EDT)
|
||
|
Subject: Python too slow for real world
|
||
|
In-Reply-To: <3720A21B.9C62DDB9@icrf.icnet.uk>
|
||
|
References: <372068E6.16A4A90@icrf.icnet.uk> <3720A21B.9C62DDB9@icrf.icnet.uk>
|
||
|
Message-ID: <Pine.A32.3.90.990423141054.23052E-100000@elvis.med.Virginia.EDU>
|
||
|
Content-Length: 3075
|
||
|
X-UID: 165
|
||
|
|
||
|
On Fri, 23 Apr 1999, Arne Mueller wrote:
|
||
|
|
||
|
> Hi All,
|
||
|
>
|
||
|
> thanks very much for all the suggestions how to speed up things and how
|
||
|
> to THINK about programming in python. I got alot of inspiration from
|
||
|
> your replys. However the problem of reading/writing larges files line by
|
||
|
> line is the source of slowing down the whole process.
|
||
|
>
|
||
|
> def rw(input, output):
|
||
|
> while 1:
|
||
|
> line = input.readline()
|
||
|
> if not line: break
|
||
|
> output.write(line)
|
||
|
>
|
||
|
> f = open('very_large_file','r')
|
||
|
> rw(f, stdout)
|
||
|
>
|
||
|
> The file I read in contains 2053927 lines and it takes 382 sec to
|
||
|
> read/write it where perl does it in 15 sec. These simple read/write
|
||
|
> functions use the functions from the C standard library, don't they? So,
|
||
|
> readline/write don't seem to be implemented very efficently ... (?)
|
||
|
>
|
||
|
> I can't read in the whole file as a single block, it's too big, if
|
||
|
> readline/write is slow the program will never get realy fast :-(
|
||
|
>
|
||
|
|
||
|
|
||
|
My guess would be that a difference this big is due to the file
|
||
|
buffering mode.
|
||
|
|
||
|
See 'open' in the library reference docs:
|
||
|
<http://www.python.org/doc/lib/built-in-funcs.html>
|
||
|
|
||
|
| open (filename[, mode[, bufsize]])
|
||
|
|
||
|
[...]
|
||
|
|
||
|
| ... The optional bufsize argument
|
||
|
| specifies the file's desired buffer size: 0 means unbuffered, 1 means
|
||
|
| line buffered, any other positive value means use a buffer of
|
||
|
| (approximately) that size. A negative bufsize means to use the system
|
||
|
| default, which is usually line buffered for for tty devices and fully
|
||
|
| buffered for other files. If omitted, the system default is used.[2.10]
|
||
|
|
||
|
|
||
|
Note that last sentence.
|
||
|
If your really testing this by writing to the standard output, it may
|
||
|
be using line buffered io. ( On a related note, I think it was AIX that
|
||
|
had a horribly misfeatured /dev/null implementation that caused io
|
||
|
tests dumped to /dev/null to be slower than if you used an actual device!)
|
||
|
|
||
|
|
||
|
Adding the following wrapper function to your 'rw' function, you
|
||
|
can test the effect of different buffer sizes or options.
|
||
|
|
||
|
from time import clock
|
||
|
|
||
|
def test1( filename, buf=None ):
|
||
|
if buf == None:
|
||
|
inp = open( filename, 'r' )
|
||
|
else:
|
||
|
inp = open( filename, 'r', buf )
|
||
|
out = open( 'junk', 'w' )
|
||
|
c0 = clock()
|
||
|
rw( inp, out )
|
||
|
c1 = clock()
|
||
|
return c1 - c0
|
||
|
|
||
|
|
||
|
On the Mac, this makes about a *37 difference.
|
||
|
( I got tired of waiting for it to finish on 'big.file' , so
|
||
|
I cut down the size. )
|
||
|
|
||
|
>>> iotest.makebigfile( 'not.so.big.file', 4001 )
|
||
|
>>> iotest.test1( 'not.so.big.file' )
|
||
|
1.18333333333
|
||
|
>>> iotest.test1( 'not.so.big.file', buf=1 )
|
||
|
1.88333333333
|
||
|
>>> iotest.test1( 'not.so.big.file', buf=0 )
|
||
|
68.3833333333
|
||
|
|
||
|
|
||
|
I surely HOPE this is your problem!
|
||
|
|
||
|
|
||
|
---| Steven D. Majewski (804-982-0831) <sdm7g at Virginia.EDU> |---
|
||
|
---| Department of Molecular Physiology and Biological Physics |---
|
||
|
---| University of Virginia Health Sciences Center |---
|
||
|
---| P.O. Box 10011 Charlottesville, VA 22906-0011 |---
|
||
|
|
||
|
Caldera Open Linux: "Powerful and easy to use!" -- Microsoft(*)
|
||
|
(*) <http://www.pathfinder.com/fortune/1999/03/01/mic.html>
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|