wasm-demo/demo/ermis-f/imap-protocol/cur/1600095009.22638.mbox:2,S

37 lines
1.8 KiB
Plaintext

MBOX-Line: From Pidgeot18 at verizon.net Mon Apr 28 21:24:15 2014
To: imap-protocol@u.washington.edu
From: Joshua Cranmer <Pidgeot18@verizon.net>
Date: Fri Jun 8 12:34:52 2018
Subject: [Imap-protocol] Email charset statistics
In-Reply-To: <52D5D84D.6070208@verizon.net>
References: <52D5D84D.6070208@verizon.net>
Message-ID: <535F296F.9090807@verizon.net>
On 1/14/2014 6:37 PM, Joshua Cranmer wrote:
> A recent concern of mine has been attempting to work out the grand
> messiness that is charsets in the context of reading and parsing email
> messages. I am not aware of any prior attempts to assess the practice
> of charsets in email, so I can only offer evidence from personal
> anecdote and culling of bug reports on open-source software, neither
> of which are a good source of information. I was wondering if anyone
> else on this list had access to a larger database of messages that
> they could check or have more specific generalities that are needed.
In an attempt to put some qualitative numbers on the statistics here, I
ended up testing the largest body of RFC 822-style messages I could
think of that was publicly available: recently-posted Usenet messages.
While Usenet and the email aren't the same thing, I'd generally expect
Usenet to be slightly worse in passing around 8-bit messages, so it's at
least a useful proxy to see how bad the situation is for some things,
but not all (e.g., good luck drawing any conclusion about HTML email
charset questions). My findings I've posted on my blog at
<http://quetzalcoatal.blogspot.com/2014/03/understanding-email-charsets.html>,
complete with a list of some recommendations I've gleaned from the data set.
Now, off to make it my personal mission to kill x-mac-croatian. :-)
--
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth