You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

39 lines
1.2 KiB

From: jkraai at polytopic.com (jkraai)
Date: Thu, 29 Apr 1999 05:07:01 GMT
Subject: HTML "sanitizer" in Python
References: <s72703fc.021@holnam.com> <19990428152042.A708@better.net> <00e501be91c4$db944f20$0301a8c0@cbd.net.au>
Message-ID: <3727E8F5.7AC7EAB0@polytopic.com>
X-UID: 74
Um, a vote of confidence here for tidy.
I've rewritten tidy to do several different specialized things.
I am no C hacker, and have been told it's 'awful' code, but I
sure had no problems with it.
,
just-another-2c-in-the-bucket-ly-yours
--jim
Mark Nottingham wrote:
>
> There's a better (albeit non-Python) way.
>
> Check out http://www.w3.org/People/Raggett/tidy/
>
> Tidy will do wonderful things in terms of making HTML compliant with the
> spec (closing tags, cleaning up the crud that Word makes, etc.) As a big
> bonus, it will remove all <FONT> tags, etc, and replace them with CSS1 style
> sheets. Wow.
>
> It's C, and is also available with a windows GUI (HTML-Kit) that makes a
> pretty good HTML editor as well. On Unix, it's a command line utility, so
> you can use it (clumsily) from a Python program.
>
> I suppose an extension could also be written; will look into this (or if
> anyone does it, please tell me!)