wasm-demo/demo/ermis-f/python_m/cur/0168

From: aahz at netcom.com (Aahz Maruch)
Date: Sat, 17 Apr 1999 14:35:48 GMT
Subject: re.sub() loops
References: <Pine.GSO.4.10.9904171556010.28418-100000@moses.sz-sb.de>
Message-ID: <aahzFAC8Jo.93H@netcom.com>
Content-Length: 1218
X-UID: 168

In article <Pine.GSO.4.10.9904171556010.28418-100000 at moses.sz-sb.de>,
Andreas Jung  <ajung at sz-sb.de> wrote:
>
>I am trying to do some lame HTML processing with some
>HTML. The following lines tries to remove some
>unneccessary code from a HTML file. However python hangs
>in this call:
>
>data = re.sub('<TABLE.*?es.*?da.*?en.*?fi.*?sv.*?TABLE>','',data)

Does the <TABLE>...</TABLE> contain *all* the strings "es", "da", "en",
"fi", and "sv"?  Or are the strings supposed to be "?es" and so on?  In
any event, with six ".*" patterns in there, you've got exponential
processing time, even if it's not hanging.

I think that if you want assistance in constructing the correct regex,
you'll need to give us more info about the data and the goal you're
trying to accomplish.  You might find it productive to pick up a copy of
the O'Reilly regex book -- I'd used regexes for years, but I didn't
really learn them until I started using that book.
--
                      --- Aahz (@netcom.com)

Hugs and backrubs -- I break Rule 6       <*>      http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het

Sometimes, you're not just out of left field, you're coming in
all the way from outer space.