94 lines
3.8 KiB
Plaintext
94 lines
3.8 KiB
Plaintext
MBOX-Line: From jeff.mckay at comaxis.com Wed Nov 2 10:53:21 2011
|
|
To: imap-protocol@u.washington.edu
|
|
From: Jeff Mckay <jeff.mckay@comaxis.com>
|
|
Date: Fri Jun 8 12:34:47 2018
|
|
Subject: [Imap-protocol] Character encoding question
|
|
In-Reply-To: <20111101230240.Horde.I0QRQoF5lbhOsM7wX2WU_SA@bigworm.curecanti.org>
|
|
References: <4EB0C241.6060900@comaxis.com>
|
|
<20111101230240.Horde.I0QRQoF5lbhOsM7wX2WU_SA@bigworm.curecanti.org>
|
|
Message-ID: <4EB18391.4040202@comaxis.com>
|
|
|
|
Thanks for your comments. I'm still a bit confused. Let me clarify
|
|
what I am seeing in
|
|
these two examples. In the first, one of the characters in question is
|
|
"lower case o with
|
|
acute" which is supposed to be xF3 in ISO-8859-2 and xC3 xB3 in UTF-8.
|
|
The imap
|
|
server represents this as ampersand followed by AMP followed by a dash
|
|
(I am writing
|
|
out the description so it does not get interpreted incorrectly
|
|
somewhere). If I take the
|
|
AMP and run it through a base64 decoder, I get xF3. So far so good.
|
|
|
|
In the second example, we have the letters Temp/New followed by a couple
|
|
Chinese
|
|
characters that I don't know the names of. The two Chinese characters
|
|
are represented
|
|
in imap by ampersand followed by bUuL1Q and the closing dash. When I base64
|
|
decode this I end up with x6D x4B x8B xD5. This appears to be big-endian
|
|
UTF-16. I have to byte-reverse each 2 byte sequence, but then I can convert
|
|
it to UTF8 (my target) and see the Chinese characters. I could also take
|
|
the original data and stick a + in front of it (ending up with +bUuL1Q) and
|
|
convert this from UTF7 to UTF8 and end up with valid characters. This
|
|
last part I really don't understand - if it is base64 encoded, how is
|
|
that valid
|
|
UTF7? Anyway, I don't seem to have an algorithm that will work on both
|
|
of these examples, and no way to detect which one I should use. Obviously
|
|
I am totally confused about what I am doing, but any further insight would
|
|
be appreciated.
|
|
|
|
Michael M Slusarz wrote:
|
|
> Quoting Jeff Mckay <jeff.mckay@comaxis.com>:
|
|
>
|
|
>> I am dealing with a Sun Java imap server that seems a little screwy
|
|
>> in regards to
|
|
>> encoding certain non-English character strings - hopefully this is my
|
|
>> problem but
|
|
>> I'm not sure what is going on. Here are a couple examples of folder
|
|
>> names
|
|
>> from this server:
|
|
>>
|
|
>> visible in client: test/A hegyek h?val bor?tott
|
|
>> encoded by imap: "test/A hegyek h&APM-val bor&AO0-tott\"
|
|
>>
|
|
>> visible in client: Temp/New??
|
|
>> encoded by imap: "Temp/New&bUuL1Q-"
|
|
>
|
|
> Both of those mailbox names look fine.
|
|
>
|
|
>> In the first case, it is necessary to take the &- part and base64
|
|
>> decode it,
|
|
>> then treat the result as modified UTF7.
|
|
>
|
|
> I am assuming that you are intending to convert the IMAP server stored
|
|
> mailbox name to a displayable representation on the client side. If
|
|
> so, your description is incorrect. The mailbox name on the server
|
|
> **is** modified UTF-7. Once you base64 decode (and remove the & and -
|
|
> delimiters), the resulting mailbox string is now in the charset of the
|
|
> MUA (e.g. UTF-8).
|
|
>
|
|
>> In the second case, the base64 decode
|
|
>> step is unnecessary, it is already in UTF7 format.
|
|
>
|
|
> Mailbox names on the IMAP server are ALWAYS modified UTF-7. So not
|
|
> sure what you mean by "unnecessary".
|
|
>
|
|
>> So my question is, when do I do a base64 decode and when not?
|
|
>
|
|
> Generally, IMHO, it will be easiest to work with mailbox names in the
|
|
> native charset on the MUA side. So you only need to convert to/from
|
|
> modified UTF-7 when either sending or parsing an IMAP command.
|
|
>
|
|
> michael
|
|
>
|
|
> _______________________________________________
|
|
> Imap-protocol mailing list
|
|
> Imap-protocol@u.washington.edu
|
|
> http://mailman2.u.washington.edu/mailman/listinfo/imap-protocol
|
|
>
|
|
>
|
|
|
|
-------------- next part --------------
|
|
An HTML attachment was scrubbed...
|
|
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20111102/1afbade3/attachment.html>
|