wasm-demo/demo/ermis-f/imap-protocol/cur/1600095172.23014.mbox:2,S

MBOX-Line: From mrc at CAC.Washington.EDU  Fri Apr 14 11:30:20 2006
To: imap-protocol@u.washington.edu
From: Mark Crispin <mrc@CAC.Washington.EDU>
Date: Fri Jun  8 12:34:37 2018
Subject: [Imap-protocol] SEARCH
In-Reply-To: <1145029156.10727.81.camel@hurina>
References: <1145029156.10727.81.camel@hurina>
Message-ID: <Pine.OSX.4.64.0604141039280.15258@pangtzu.panda.com>

On Fri, 14 Apr 2006, Timo Sirainen wrote:
> If I read this correctly, searching logic works like this:
>
> If search charset is US-ASCII, server either
> a) simply does a substring match for the entire message, or
> b) decodes MIME parts based on Content-Transfer-Encoding header and
> decodes the MIME headers themselves, and then does substring matching
>
> If search charset is not US-ASCII, only b) is allowed.

My personal opinion (not "Mr. IMAP Protocol"):

(a) is the IMAP2 interpretation, and (b) is the IMAP4 interpretation; and
that a full IMAP4 server always does (b).

However, since US-ASCII is the only mandatory to implement search charset,
an IMAP4 server which only implements US-ASCII and only does (a) remains
compliant.  This is a compatibility-with-the-past wart, and new server
implementations should not do this (except as a development step)

At a minimum, UTF-8 SHOULD be supported as a search charset.

(a) has known problems.  It has false positives and false negatives
because it does not decode the content.  This occurs even with ASCII.

> If the search key is invalid for the given character set, should server
> return BAD error to client? Are non-ASCII characters in search key
> invalid for US-ASCII charset?

I'm not certain what you mean by "invalid".

Do you mean "contain a codepoint that is not in that charset"?  If so, I
think a failed match is better than a BAD, since it may be that the server
has an obsolete version of that charset's definition.

> What about if search key contains non-ASCII characters but no charset
> parameter is given? Currently I assume this means just doing a substring
> search from messages without doing any charset conversions (i;octet
> comparator).

It can mean whatever you want, although perhaps a failed match is best.
Or maybe a BAD in this case, because the specification does denounce use
of 8-bit strings without a charset identification in section 4.3.1

It's not defined, and in such cases servers can do as they want.  As with
other undefined situations, it may be defined in the future for servers
that advertise an extension.

Clients which do undefined things are broken.

> I don't see it clearly mentioned how searching MIME parts should work,
> but since it only talks about substring matching, I assume that it
> shouldn't really care about MIME parts that much.

I don't understand this comment.  You should apply content transfer
decoding.  Most people canonicalize search keys and MIME parts into UTF-8
prior to matching.  Also look into case coercion and decomposition,
although the IMAP i18n/stringprep specification will specify this (and
more) in detail so maybe you want to hold off.

> Especially BODY searching talks about searching from message bodies. Are
> MIME part headers part of a message body? I guess not, because UW-IMAP
> skips them.

TEXT searches search the entire message including RFC 2822 and MIME
metadata.

BODY searches omit RFC 2822 and MIME metadata.

> More interesting are MIME footer and trailer sections. Should they be
> searched? UW-IMAP skips them.

I consider these not to be part of a message at all for any MIME-savvy
application.

> What about MIME boundary lines? UW-IMAP
> searches these, but not if you include its "--" prefix in search key.

Are you certain that you aren't confusing BODY and TEXT searches?  A TEXT
search would find them, because they appear in the MIME header.

> Is "Header: value" searching required to work? I think it is, and works
> with UW-IMAP.

What do you mean by this?  If you're talking about a TEXT search, then it
may or may not work depending upon the octets in a message.  You should be
using a "HEADER Header: value" search instead.

> Is "line\r\nline2" (as literal of course with real CR+LF)
> searching required to work in message body? Again, I think so and works
> with UW-IMAP.

Yes, it should in a TEXT search.  But see below.

> But then is "Header: value\r\nHeader2: value2" searching
> required to work? I don't see why not, but this doesn't work anymore
> with UW-IMAP.

Once again, I'd like to understand what you mean by this.

If you're talking about a TEXT search, I don't see why it shouldn't work,
although it might be that you have a mailbox format that uses UNIX-style
newlines and the data was not CRLF-converted.

 	HEADER Header: {....}
 	value\r\nHeader2: value2
will certainly not work.

I don't think that it is useful for a client to have newlines in a search
key.  Some servers try to do fuzzy matching, so for example if you search
for "Joe's trip to Paris" there will be a match even if it was broken by a
newline.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.