wasm-demo/demo/ermis-f/imap-protocol/cur/1600095058.22756.mbox:2,S

MBOX-Line: From jeff.mckay at comaxis.com  Wed Nov  2 10:53:21 2011
To: imap-protocol@u.washington.edu
From: Jeff Mckay <jeff.mckay@comaxis.com>
Date: Fri Jun  8 12:34:47 2018
Subject: [Imap-protocol] Character encoding question
In-Reply-To: <20111101230240.Horde.I0QRQoF5lbhOsM7wX2WU_SA@bigworm.curecanti.org>
References: <4EB0C241.6060900@comaxis.com>
	<20111101230240.Horde.I0QRQoF5lbhOsM7wX2WU_SA@bigworm.curecanti.org>
Message-ID: <4EB18391.4040202@comaxis.com>

Thanks for your comments.  I'm still a bit confused.  Let me clarify
what I am seeing in
these two examples.  In the first, one of the characters in question is
"lower case o with
acute" which is supposed to be xF3 in ISO-8859-2 and xC3 xB3 in UTF-8.
The imap
server represents this as ampersand followed by AMP followed by a dash
(I am writing
out the description so it does not get interpreted incorrectly
somewhere).  If I take the
AMP and run it through a base64 decoder, I get xF3. So far so good.

In the second example, we have the letters Temp/New followed by a couple
Chinese
characters that I don't know the names of.  The two Chinese characters
are represented
in imap by ampersand followed by bUuL1Q and the closing dash.  When I base64
decode this I end up with x6D x4B x8B xD5.  This appears to be big-endian
UTF-16.  I have to byte-reverse each 2 byte sequence, but then I can convert
it to UTF8 (my target) and see the Chinese characters.   I could also take
the original data and stick a + in front of it (ending up with +bUuL1Q) and
convert this from UTF7 to UTF8 and end up with valid characters.  This
last part I really don't understand - if it is base64 encoded, how is
that valid
UTF7?  Anyway, I don't seem to have an algorithm that will work on both
of these examples, and no way to detect which one I should use.  Obviously
I am totally confused about what I am doing, but any further insight would
be appreciated.

Michael M Slusarz wrote:
> Quoting Jeff Mckay <jeff.mckay@comaxis.com>:
>
>> I am dealing with a Sun Java imap server that seems a little screwy
>> in regards to
>> encoding certain non-English character strings - hopefully this is my
>> problem but
>> I'm not sure what is going on.  Here are a couple examples of folder
>> names
>> from this server:
>>
>> visible in client:  test/A hegyek h?val bor?tott
>> encoded by imap: "test/A hegyek h&APM-val bor&AO0-tott\"
>>
>> visible in client: Temp/New??
>> encoded by imap: "Temp/New&bUuL1Q-"
>
> Both of those mailbox names look fine.
>
>> In the first case, it is necessary to take the &AMP- part and base64
>> decode it,
>> then treat the result as modified UTF7.
>
> I am assuming that you are intending to convert the IMAP server stored
> mailbox name to a displayable representation on the client side.  If
> so, your description is incorrect.  The mailbox name on the server
> **is** modified UTF-7.  Once you base64 decode (and remove the & and -
> delimiters), the resulting mailbox string is now in the charset of the
> MUA (e.g. UTF-8).
>
>>  In the second case, the base64 decode
>> step is unnecessary, it is already in UTF7 format.
>
> Mailbox names on the IMAP server are ALWAYS modified UTF-7.  So not
> sure what you mean by "unnecessary".
>
>> So my question is, when do I do a base64 decode and when not?
>
> Generally, IMHO, it will be easiest to work with mailbox names in the
> native charset on the MUA side.  So you only need to convert to/from
> modified UTF-7 when either sending or parsing an IMAP command.
>
> michael
>
> _______________________________________________
> Imap-protocol mailing list
> Imap-protocol@u.washington.edu
> http://mailman2.u.washington.edu/mailman/listinfo/imap-protocol
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20111102/1afbade3/attachment.html>