41 lines
1.5 KiB
Plaintext
41 lines
1.5 KiB
Plaintext
From: aahz at netcom.com (Aahz Maruch)
|
|
Date: Sat, 17 Apr 1999 14:35:48 GMT
|
|
Subject: re.sub() loops
|
|
References: <Pine.GSO.4.10.9904171556010.28418-100000@moses.sz-sb.de>
|
|
Message-ID: <aahzFAC8Jo.93H@netcom.com>
|
|
Content-Length: 1218
|
|
X-UID: 168
|
|
|
|
In article <Pine.GSO.4.10.9904171556010.28418-100000 at moses.sz-sb.de>,
|
|
Andreas Jung <ajung at sz-sb.de> wrote:
|
|
>
|
|
>I am trying to do some lame HTML processing with some
|
|
>HTML. The following lines tries to remove some
|
|
>unneccessary code from a HTML file. However python hangs
|
|
>in this call:
|
|
>
|
|
>data = re.sub('<TABLE.*?es.*?da.*?en.*?fi.*?sv.*?TABLE>','',data)
|
|
|
|
Does the <TABLE>...</TABLE> contain *all* the strings "es", "da", "en",
|
|
"fi", and "sv"? Or are the strings supposed to be "?es" and so on? In
|
|
any event, with six ".*" patterns in there, you've got exponential
|
|
processing time, even if it's not hanging.
|
|
|
|
I think that if you want assistance in constructing the correct regex,
|
|
you'll need to give us more info about the data and the goal you're
|
|
trying to accomplish. You might find it productive to pick up a copy of
|
|
the O'Reilly regex book -- I'd used regexes for years, but I didn't
|
|
really learn them until I started using that book.
|
|
--
|
|
--- Aahz (@netcom.com)
|
|
|
|
Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/
|
|
Androgynous poly kinky vanilla queer het
|
|
|
|
Sometimes, you're not just out of left field, you're coming in
|
|
all the way from outer space.
|
|
|
|
|
|
|
|
|