40 lines
1.2 KiB
Plaintext
40 lines
1.2 KiB
Plaintext
|
From: jkraai at polytopic.com (jkraai)
|
||
|
Date: Thu, 29 Apr 1999 05:07:01 GMT
|
||
|
Subject: HTML "sanitizer" in Python
|
||
|
References: <s72703fc.021@holnam.com> <19990428152042.A708@better.net> <00e501be91c4$db944f20$0301a8c0@cbd.net.au>
|
||
|
Message-ID: <3727E8F5.7AC7EAB0@polytopic.com>
|
||
|
X-UID: 74
|
||
|
|
||
|
Um, a vote of confidence here for tidy.
|
||
|
|
||
|
I've rewritten tidy to do several different specialized things.
|
||
|
|
||
|
I am no C hacker, and have been told it's 'awful' code, but I
|
||
|
sure had no problems with it.
|
||
|
,
|
||
|
just-another-2c-in-the-bucket-ly-yours
|
||
|
|
||
|
--jim
|
||
|
|
||
|
Mark Nottingham wrote:
|
||
|
>
|
||
|
> There's a better (albeit non-Python) way.
|
||
|
>
|
||
|
> Check out http://www.w3.org/People/Raggett/tidy/
|
||
|
>
|
||
|
> Tidy will do wonderful things in terms of making HTML compliant with the
|
||
|
> spec (closing tags, cleaning up the crud that Word makes, etc.) As a big
|
||
|
> bonus, it will remove all <FONT> tags, etc, and replace them with CSS1 style
|
||
|
> sheets. Wow.
|
||
|
>
|
||
|
> It's C, and is also available with a windows GUI (HTML-Kit) that makes a
|
||
|
> pretty good HTML editor as well. On Unix, it's a command line utility, so
|
||
|
> you can use it (clumsily) from a Python program.
|
||
|
>
|
||
|
> I suppose an extension could also be written; will look into this (or if
|
||
|
> anyone does it, please tell me!)
|
||
|
|
||
|
|
||
|
|
||
|
|