195 lines
8.8 KiB
Plaintext
195 lines
8.8 KiB
Plaintext
MBOX-Line: From mrc at CAC.Washington.EDU Tue Apr 11 22:56:33 2006
|
||
To: imap-protocol@u.washington.edu
|
||
From: Mark Crispin <mrc@CAC.Washington.EDU>
|
||
Date: Fri Jun 8 12:34:37 2018
|
||
Subject: [Imap-protocol] LIST Clarification
|
||
In-Reply-To: <web-35045943@mail.stalker.com>
|
||
References: <443A7A2D.2070708@consilient.com> <web-35034698@mail.stalker.com>
|
||
<Pine.OSX.4.64.0604101053530.2906@pangtzu.panda.com>
|
||
<web-35035906@mail.stalker.com>
|
||
<Pine.WNT.4.65.0604101441160.4904@Tomobiki-Cho.CAC.Washington.EDU>
|
||
<443C0286.60200@att.com>
|
||
<Pine.WNT.4.65.0604111228310.3332@Tomobiki-Cho.CAC.Washington.EDU>
|
||
<syMWpRdWCviH+GmTLGxvWQ.md5@libertango.oryx.com>
|
||
<Pine.WNT.4.65.0604111418130.3332@Tomobiki-Cho.CAC.Washington.EDU>
|
||
<web-35044345@mail.stalker.com>
|
||
<Pine.WNT.4.65.0604111814490.3332@Tomobiki-Cho.CAC.Washington.EDU>
|
||
<web-35045943@mail.stalker.com>
|
||
Message-ID: <Pine.OSX.4.64.0604112206540.2906@pangtzu.panda.com>
|
||
|
||
On Tue, 11 Apr 2006, Vladimir A. Butenko wrote:
|
||
> There is a mailbox called "Mark Crispin". The system is case-insensitive.
|
||
> a LIST "Mark%"
|
||
> * LIST "Mark Crispin"
|
||
> a OK
|
||
> b LIST "" "%CRISPIN"
|
||
> * LIST "MARK CRISPIN"
|
||
> The later is a questionable practice, but it definitely has more merits than
|
||
> returning unmodified "Mark Crispin" as our server (and, I guess, yours) is
|
||
> doing now. The "Mark Crispin" string would confuse a client that does not
|
||
> expect that string to match the "%CRISPIN" pattern.
|
||
|
||
Actually, I believe that a case-independent server should have a canonical
|
||
case form -- all-upper, all-lower, as-created, or whatever -- and that
|
||
form should always be returned by LIST regardless of the pattern.
|
||
|
||
> If there is no "fixed" name for INBOX, it's just a "case-insensitive" name,
|
||
> why would "Mark Crispin" be a "real" name for that mailbox of a
|
||
> case-insensitive system? So, if the server is free to return "Inbox" on LIST
|
||
> "Inbox", it's free to return "MARK CRISPIN" on "%CRISPIN".
|
||
|
||
I agree that the canonical case form of INBOX is "INBOX". That was always
|
||
the intention from Day 1.
|
||
|
||
> Delete the .mark file, create a .Mark file, try to read ".mark" - if you do
|
||
> that once, during start-up of your server, that would be enough.
|
||
|
||
Yes, but that's only if there is a single filesystem on the server.
|
||
|
||
> It's more
|
||
> difficult when you have to serve several millions of users, where the storage
|
||
> can be distributed to many NFS and CFS file systems - but believe me - I have
|
||
> never seen any real-life installation that is at least 100,000 users strong
|
||
> (leave alone 5,000,000 users strong) that used different file systems for
|
||
> different users.
|
||
|
||
I don't know if that's safe any more. We can assume that UNIX based IMAP
|
||
servers are probably not going to export Windows filesystems. However,
|
||
Mac OS X has really muddied the waters. My own Mac has a case-sensitive
|
||
and a case-insensitive filesystem.
|
||
|
||
I'm also not confortable about assuming that what is commonplace and good
|
||
practice today would remain the same 5-10 years from now. I've had too
|
||
many painful lessons...
|
||
|
||
> But - there is no low-case/upper-case problem with the Japanese charsets, so
|
||
> you may want to play with mailboxes in Roman, but non-Latin alphabets, -
|
||
> French, Spanish, German etc. Umlauts are tough boys to fight with in the
|
||
> upper-lowercase battles...
|
||
|
||
We can use Unicode titlecase. I think that an update to RFC 3501 should
|
||
probably specify that case-insensitivity in searching means "same
|
||
titlecase" as opposed to "same uppercase" or "same lowercase".
|
||
|
||
> I'm afraid there is a misunderstanding here. How should I specify the $B'A'k(B%$B'\(B
|
||
> pattern in the LIST command? "&hjhj-%&jkjk-" where "&hjhj-" is "$B'A'k(B", and
|
||
> "&jk-" is "$B'\(B"?
|
||
|
||
Ah, good point. The only wildcards that work well with M-UTF7 names are
|
||
within the ASCII part of the name.
|
||
|
||
Fortunately, most clients only use wildcards for an entire hierarchy
|
||
level. But this is a good argument to move to UTF-8 names.
|
||
|
||
> Or should the
|
||
> client NOT assume anything and treat ANY response from the server as
|
||
> correctly matching the pattern the client has provided - matching according
|
||
> to the rules known to the server only, not the client?
|
||
|
||
This is effectively what IMAP says.
|
||
|
||
> If someone suggests to use the later answer, then it's a call for trouble:
|
||
>
|
||
> then if a client makes a call
|
||
> A LIST %
|
||
> and gets
|
||
> * LIST ZZZ
|
||
>
|
||
> and then it makes a call
|
||
> A LIST ZZZ/%
|
||
> (as many clients do to deal with mailbox hierarchies)
|
||
> then the client HAS ABSOLUTELY NO RIGHT to expect that all returned names
|
||
> will start with "ZZZ/", and thus - can be displayed as "ZZZ" "subtree".
|
||
|
||
I don't understand why. There's no M-UTF7 here. The only thing that may
|
||
different is that with a case-insensitive server that it may return names
|
||
that start with zzz/ or Zzz/ etc.
|
||
|
||
> Sounds strange, right? But we are saying that when a client sends "&hj-%&HJ-"
|
||
> in the LIST command, it should not expect to see any name that ends on "HJ-",
|
||
> right?
|
||
|
||
I agree that M-UTF7 patterns with embedded wildcards don't work.
|
||
|
||
>> I don't follow you as to why all IMAP server "MUST be case-sensitive",
|
||
>> since clearly there are examples of servers which are case-insensitive,
|
||
> Because case-insensitive servers create a mess. The protocol becomes
|
||
> ill-defined (read: broken, incomplete, unusable - select your own favourite,
|
||
> the most offending word :-).
|
||
|
||
That's only if the client expects to know what the server will do for any
|
||
particular LIST command if it has earlier done a LIST of *.
|
||
|
||
IMAP makes no such promises. It's implementation dependent, as are the
|
||
semantics of the naming hierarchy.
|
||
|
||
This was a mistake. We all acknowledge it to have been a mistake.
|
||
However, the discussion about naming that took place in the early 1990s
|
||
wasted at least 18 months of everybody's time (and probably reduced all of
|
||
our lifespans by a few years due to high blood pressure). What came up
|
||
was a wretched compromise, but at least it let us do our work.
|
||
|
||
I disagree that case-insensitive servers are ill-defined; any
|
||
case-insensitive server is very well defined. The problem is that there
|
||
are multiple definitions, and no authority to declare one definition to be
|
||
correct.
|
||
|
||
> If the servers would impose strict case-sensitivity (including that for
|
||
> INBOX), the IMAP protocol (at least within its mailbox-name related part)
|
||
> becomes a well-defined, "real", "professional" (select the most pleasing word
|
||
> here) protocol.
|
||
|
||
Uhh...I don't know about that. Consider all the repetitions of the Latin
|
||
alphabet in Unicode.
|
||
|
||
You'd have a better case if you said "if all mailbox names were solely a
|
||
sequence of unique binary octets, with no human interpretation, then the
|
||
IMAP protocol become..." ;-)
|
||
|
||
>> We should progress to UTF-8 mailbox names.
|
||
> But before you do, please investigate all these case-sensitivity problems
|
||
> with non-LATIN (and non-Japanese ;-) alphabets.
|
||
|
||
Unicode titlecase. It's not always right for all languages, but at least
|
||
it is something that is well-defined.
|
||
|
||
> And, while doing this, please
|
||
> remember that the number of people who were shocked to learn than they could
|
||
> not create a folder named "12/12/2005" is much higher than the number of
|
||
> people knowing what a "hierarchy separator" is. But that number is still
|
||
> smaller than the number of people who did succeed to create such a mailbox
|
||
> and then asked their ISP/IT support about "that funny mailbox '12' and some
|
||
> strange symbols around it". I.e. escape symbols would be good, or, of you
|
||
> choose to use UTF-8/Unicode, you may want to use some "unprintable" character
|
||
> as the path separator.
|
||
|
||
Had URLs existed at the time, that is what IMAP should have used. That at
|
||
least is a hierarchy that people understand.
|
||
|
||
>> As I say on my web page, any field of study which has "science" in its name
|
||
>> is not a science... ;-)
|
||
> Sure. I totally agree, and that's why I was hated by all Computer Science
|
||
> departments :-) Their only competitors where teachers of the Scientific
|
||
> Communism - but they were weaker, as they did not believe in their books
|
||
> themselves. CS departments were tougher... :-)
|
||
|
||
Indeed.
|
||
|
||
>>>> Si vis pacem, para bellum.
|
||
>>> and when you wish for war, prepare for a long boring peace? :-)
|
||
>> I don't know, as only fascists wish for war;
|
||
> Mark, I just wanted to make a quite innocent joke. Looks like I've failed,
|
||
> and let's not turn it into a Political Science exercise. Let's continue to
|
||
> play within the Computer Science barrack :-)
|
||
|
||
Oh, I understood the joke. However, it's worth remembering that 61 years
|
||
ago, our parents sacrificed a great deal so that we could grow up to play
|
||
with computers...
|
||
|
||
-- Mark --
|
||
|
||
http://panda.com/mrc
|
||
Democracy is two wolves and a sheep deciding what to eat for lunch.
|
||
Liberty is a well-armed sheep contesting the vote.
|
||
|