wasm-demo/demo/ermis-f/python_m/cur/0377

105 lines
2.7 KiB
Plaintext

From: mlh at idt.ntnu.no (Magnus L. Hetland)
Date: 23 Apr 1999 15:55:53 +0200
Subject: Python too slow for real world
References: <372068E6.16A4A90@icrf.icnet.uk>
Message-ID: <y0jhfq7o2za.fsf@vier.idi.ntnu.no>
Content-Length: 2475
X-UID: 377
Arne Mueller <a.mueller at icrf.icnet.uk> writes:
> Hi All,
>
> first off all: Sorry for that slightly provoking subject ;-) ...
[...]
>
> The following python code does the job:
[...]
> f = open('my_very_big_data_file','r') # datafile with ~300000 records
> read_write(f, stdout, {}) # for a simple test I don't exclude
> anything!
Well -- re is known to be slow. If you have to be fast, maybe you
should try not to use regular expressions; You could perhaps use
something from the string module (several options there) or maybe even
consider fixed-length fields for the identifiers, which should speed
up things a bit.
> It took 503.90 sec on a SGI Power Challange (R10000 CPU). An appropiate
> perl script does the same job in 32 sec (Same method, same loop
> structure)!
Hm. Perl probably has a more efficient implementation of Perl regexes
than Python, naturally enough...
> I'd realy like to know why python is so slow (or perl is so fast?) and
> what I can do to improove speed of that routine.
Well -- at least I have made one suggestion... Though it may not
explain it all...
>
> I don't want to switch back to perl - but honestly, is python the right
> language to process souch huge amount of data?
>
> If you want to generate a test set you could use the following lines to
> print 10000 datasets to stdout:
>
> for i in xrange(1, 10001):
> print
> '>px%05d\nLSADQISTVQASFDKVKGDPVGILYAVFKADPSIMAKFTQFAGKDLESIKGTAPFETHAN\n\
> RIVGFFSKIIGELPNIEADVNTFVASHKPRGVTHDQLNNFRAGFVSYMKAHTDFAGAEAA\n\
> WGATLDTFFGMIFSKM\n' % i
>
> And if you don't believe me that perl does the job quicker you can try
> the perl code below:
[...]
OK. Using your testset, I tried the following program (It may not work
exactly like your script...)
I have made the assumption that all the id's have a constant length of
7.
----------
import fileinput
exclude = {'px00003': 1}
skip = 0
for line in fileinput.input():
if line[0] == '>':
id = line[1:8]
if exclude.has_key(id):
skip = 1
else:
skip = 0
if not skip:
print line,
-----------
It took about 12 seconds.
>
> Please do convince me being a python programmer does not mean being slow
> ;-)
>
At least I tried...
> Thanks very much for any help,
>
> Arne
--
> Hi! I'm the signature virus 99!
Magnus > Copy me into your signature and join the fun!
Lie
Hetland http://arcadia.laiv.org <arcadia at laiv.org>