From: jkraai at polytopic.com (jkraai) Date: Thu, 29 Apr 1999 05:07:01 GMT Subject: HTML "sanitizer" in Python References: <19990428152042.A708@better.net> <00e501be91c4$db944f20$0301a8c0@cbd.net.au> Message-ID: <3727E8F5.7AC7EAB0@polytopic.com> X-UID: 74 Um, a vote of confidence here for tidy. I've rewritten tidy to do several different specialized things. I am no C hacker, and have been told it's 'awful' code, but I sure had no problems with it. , just-another-2c-in-the-bucket-ly-yours --jim Mark Nottingham wrote: > > There's a better (albeit non-Python) way. > > Check out http://www.w3.org/People/Raggett/tidy/ > > Tidy will do wonderful things in terms of making HTML compliant with the > spec (closing tags, cleaning up the crud that Word makes, etc.) As a big > bonus, it will remove all tags, etc, and replace them with CSS1 style > sheets. Wow. > > It's C, and is also available with a windows GUI (HTML-Kit) that makes a > pretty good HTML editor as well. On Unix, it's a command line utility, so > you can use it (clumsily) from a Python program. > > I suppose an extension could also be written; will look into this (or if > anyone does it, please tell me!)