You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

108 lines
3.4 KiB

From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 23 Apr 1999 14:26:41 -0400 (EDT)
Subject: Python too slow for real world
In-Reply-To: <3720A21B.9C62DDB9@icrf.icnet.uk>
References: <372068E6.16A4A90@icrf.icnet.uk> <3720A21B.9C62DDB9@icrf.icnet.uk>
Message-ID: <Pine.A32.3.90.990423141054.23052E-100000@elvis.med.Virginia.EDU>
Content-Length: 3075
X-UID: 165
On Fri, 23 Apr 1999, Arne Mueller wrote:
> Hi All,
>
> thanks very much for all the suggestions how to speed up things and how
> to THINK about programming in python. I got alot of inspiration from
> your replys. However the problem of reading/writing larges files line by
> line is the source of slowing down the whole process.
>
> def rw(input, output):
> while 1:
> line = input.readline()
> if not line: break
> output.write(line)
>
> f = open('very_large_file','r')
> rw(f, stdout)
>
> The file I read in contains 2053927 lines and it takes 382 sec to
> read/write it where perl does it in 15 sec. These simple read/write
> functions use the functions from the C standard library, don't they? So,
> readline/write don't seem to be implemented very efficently ... (?)
>
> I can't read in the whole file as a single block, it's too big, if
> readline/write is slow the program will never get realy fast :-(
>
My guess would be that a difference this big is due to the file
buffering mode.
See 'open' in the library reference docs:
<http://www.python.org/doc/lib/built-in-funcs.html>
| open (filename[, mode[, bufsize]])
[...]
| ... The optional bufsize argument
| specifies the file's desired buffer size: 0 means unbuffered, 1 means
| line buffered, any other positive value means use a buffer of
| (approximately) that size. A negative bufsize means to use the system
| default, which is usually line buffered for for tty devices and fully
| buffered for other files. If omitted, the system default is used.[2.10]
Note that last sentence.
If your really testing this by writing to the standard output, it may
be using line buffered io. ( On a related note, I think it was AIX that
had a horribly misfeatured /dev/null implementation that caused io
tests dumped to /dev/null to be slower than if you used an actual device!)
Adding the following wrapper function to your 'rw' function, you
can test the effect of different buffer sizes or options.
from time import clock
def test1( filename, buf=None ):
if buf == None:
inp = open( filename, 'r' )
else:
inp = open( filename, 'r', buf )
out = open( 'junk', 'w' )
c0 = clock()
rw( inp, out )
c1 = clock()
return c1 - c0
On the Mac, this makes about a *37 difference.
( I got tired of waiting for it to finish on 'big.file' , so
I cut down the size. )
>>> iotest.makebigfile( 'not.so.big.file', 4001 )
>>> iotest.test1( 'not.so.big.file' )
1.18333333333
>>> iotest.test1( 'not.so.big.file', buf=1 )
1.88333333333
>>> iotest.test1( 'not.so.big.file', buf=0 )
68.3833333333
I surely HOPE this is your problem!
---| Steven D. Majewski (804-982-0831) <sdm7g at Virginia.EDU> |---
---| Department of Molecular Physiology and Biological Physics |---
---| University of Virginia Health Sciences Center |---
---| P.O. Box 10011 Charlottesville, VA 22906-0011 |---
Caldera Open Linux: "Powerful and easy to use!" -- Microsoft(*)
(*) <http://www.pathfinder.com/fortune/1999/03/01/mic.html>