wasm-demo/demo/ermis-f/python_m/cur/0281

72 lines
2.0 KiB
Plaintext

From: mlh at idt.ntnu.no (Magnus L. Hetland)
Date: 23 Apr 1999 16:13:12 +0200
Subject: Python too slow for real world
References: <372068E6.16A4A90@icrf.icnet.uk> <37207685.F29BE1AB@ingr.com>
Message-ID: <y0jemlbo26f.fsf@vier.idi.ntnu.no>
Content-Length: 1710
X-UID: 281
Joseph Robertson <jmrober1 at ingr.com> writes:
> Hi,
>
> For what you state here, you don't even really need to read the 'data' at
> all.
> Just read your descriptors, and store the offsets and len of the data in a
> dictionary (i.e. index it).
>
> readline
> if first char == >
> get id
> get current position using seek method
> store id, pos in dict
> #for each id, we now have its byte posisition in the file
Well... You have to read all the lines to find all the descriptors,
don't you? Is there really any great speadup here?
Of course, you would get some speedup later, when using the same
structure again...
>
> Then have a filter method which keeps or discards the records by criteria.
>
[...]
If the number of excluded element isn't very high, this method will
only add to the burden of processing, won't it?
(By seek -- do you mean os.lseek? Or is ther another one... Just curious.)
>
> This way you can create views on your data without actually trying to load it
> all. The tradeoff of course is memory for fileaccess time, but I found
> fileaccess to be faster than doing all the work 'up front'.
Hm. Yes.
If the size (in lines) of the records is constant, then you could, of
course, use seek to skip all the data while processing as well...
> Besides my
> project reached the point where we ran out of memory often, some datasets are
> on 8+ cdroms!
>
> Hope that was relevant, but maybe I misunderstood the question.
> Joe Robertson,
> jmrobert at ro.com
>
>
>
>
[...]
--
> Hi! I'm the signature virus 99!
Magnus > Copy me into your signature and join the fun!
Lie
Hetland http://arcadia.laiv.org <arcadia at laiv.org>