105 lines
2.7 KiB
Plaintext
105 lines
2.7 KiB
Plaintext
From: mlh at idt.ntnu.no (Magnus L. Hetland)
|
|
Date: 23 Apr 1999 15:55:53 +0200
|
|
Subject: Python too slow for real world
|
|
References: <372068E6.16A4A90@icrf.icnet.uk>
|
|
Message-ID: <y0jhfq7o2za.fsf@vier.idi.ntnu.no>
|
|
Content-Length: 2475
|
|
X-UID: 377
|
|
|
|
Arne Mueller <a.mueller at icrf.icnet.uk> writes:
|
|
|
|
> Hi All,
|
|
>
|
|
> first off all: Sorry for that slightly provoking subject ;-) ...
|
|
[...]
|
|
>
|
|
> The following python code does the job:
|
|
[...]
|
|
> f = open('my_very_big_data_file','r') # datafile with ~300000 records
|
|
> read_write(f, stdout, {}) # for a simple test I don't exclude
|
|
> anything!
|
|
|
|
Well -- re is known to be slow. If you have to be fast, maybe you
|
|
should try not to use regular expressions; You could perhaps use
|
|
something from the string module (several options there) or maybe even
|
|
consider fixed-length fields for the identifiers, which should speed
|
|
up things a bit.
|
|
|
|
> It took 503.90 sec on a SGI Power Challange (R10000 CPU). An appropiate
|
|
> perl script does the same job in 32 sec (Same method, same loop
|
|
> structure)!
|
|
|
|
Hm. Perl probably has a more efficient implementation of Perl regexes
|
|
than Python, naturally enough...
|
|
|
|
> I'd realy like to know why python is so slow (or perl is so fast?) and
|
|
> what I can do to improove speed of that routine.
|
|
|
|
Well -- at least I have made one suggestion... Though it may not
|
|
explain it all...
|
|
|
|
>
|
|
> I don't want to switch back to perl - but honestly, is python the right
|
|
> language to process souch huge amount of data?
|
|
>
|
|
> If you want to generate a test set you could use the following lines to
|
|
> print 10000 datasets to stdout:
|
|
>
|
|
> for i in xrange(1, 10001):
|
|
> print
|
|
> '>px%05d\nLSADQISTVQASFDKVKGDPVGILYAVFKADPSIMAKFTQFAGKDLESIKGTAPFETHAN\n\
|
|
> RIVGFFSKIIGELPNIEADVNTFVASHKPRGVTHDQLNNFRAGFVSYMKAHTDFAGAEAA\n\
|
|
> WGATLDTFFGMIFSKM\n' % i
|
|
>
|
|
> And if you don't believe me that perl does the job quicker you can try
|
|
> the perl code below:
|
|
[...]
|
|
|
|
OK. Using your testset, I tried the following program (It may not work
|
|
exactly like your script...)
|
|
|
|
I have made the assumption that all the id's have a constant length of
|
|
7.
|
|
|
|
----------
|
|
|
|
import fileinput
|
|
|
|
exclude = {'px00003': 1}
|
|
skip = 0
|
|
|
|
for line in fileinput.input():
|
|
if line[0] == '>':
|
|
id = line[1:8]
|
|
if exclude.has_key(id):
|
|
skip = 1
|
|
else:
|
|
skip = 0
|
|
if not skip:
|
|
print line,
|
|
|
|
-----------
|
|
|
|
It took about 12 seconds.
|
|
|
|
>
|
|
> Please do convince me being a python programmer does not mean being slow
|
|
> ;-)
|
|
>
|
|
|
|
At least I tried...
|
|
|
|
> Thanks very much for any help,
|
|
>
|
|
> Arne
|
|
|
|
--
|
|
> Hi! I'm the signature virus 99!
|
|
Magnus > Copy me into your signature and join the fun!
|
|
Lie
|
|
Hetland http://arcadia.laiv.org <arcadia at laiv.org>
|
|
|
|
|
|
|
|
|