You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

39 lines
1.2 KiB

From: jkraai at (jkraai)
Date: Thu, 29 Apr 1999 05:07:01 GMT
Subject: HTML "sanitizer" in Python
References: <> <> <00e501be91c4$db944f20$>
Message-ID: <>
X-UID: 74
Um, a vote of confidence here for tidy.
I've rewritten tidy to do several different specialized things.
I am no C hacker, and have been told it's 'awful' code, but I
sure had no problems with it.
Mark Nottingham wrote:
> There's a better (albeit non-Python) way.
> Check out
> Tidy will do wonderful things in terms of making HTML compliant with the
> spec (closing tags, cleaning up the crud that Word makes, etc.) As a big
> bonus, it will remove all <FONT> tags, etc, and replace them with CSS1 style
> sheets. Wow.
> It's C, and is also available with a windows GUI (HTML-Kit) that makes a
> pretty good HTML editor as well. On Unix, it's a command line utility, so
> you can use it (clumsily) from a Python program.
> I suppose an extension could also be written; will look into this (or if
> anyone does it, please tell me!)