From: Aahz Maruch
Date: Sat, 17 Apr 1999
Subject: re.sub() loops
Andreas Jung wrote:
Andreas Jung <ajung at> wrote:
>I am trying to do some lame HTML processing with some
>HTML. The following lines tries to remove some
>unneccessary code from a HTML file. However python hangs
>in this call:
>data = re.sub('<TABLE.*?es.*?da.*?en.*?fi.*?sv.*?TABLE>','',data)
Does the <TABLE>...</TABLE> contain *all* the strings "es", "da", "en",
"fi", and "sv"? Or are the strings supposed to be "?es" and so on? In
any event, with six ".*" patterns in there, you've got exponential
processing time, even if it's not hanging.
I think that if you want assistance in constructing the correct regex,
you'll need to give us more info about the data and the goal you're
trying to accomplish. You might find it productive to pick up a copy of
the O'Reilly regex book -- I'd used regexes for years, but I didn't
really learn them until I started using that book.
