wasm-demo/demo/ermis-f/imap-protocol/cur/1600094999.22592.mbox:2,S

127 lines
5.0 KiB
Plaintext

MBOX-Line: From imap at maclean.com Tue Apr 7 17:58:01 2015
To: imap-protocol@u.washington.edu
From: Pete Maclean <imap@maclean.com>
Date: Fri Jun 8 12:34:54 2018
Subject: [Imap-protocol] SEARCH semantics
In-Reply-To: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
References: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
Message-ID: <mailman.21.1528486494.22076.imap-protocol@mailman13.u.washington.edu>
David, I went through what you are going through a couple of years
ago. My original SEARCH implementation was also a shambles and it
was only the coming of a major new customer that prompted me to
rework it. I never had a single complaint about the original code
though which I put down to the lack of support for SEARCH in
clients. Today I think it is much more important to have good SEARCH
especially with IMAP servers more and more fronting email archives in
addition to conventional email servers.
I am well aware that the specification is fuzzy and I know that some
implementations take great liberties. Some servers, for example,
treat text searches as word-based while the RFC demands that they be
string-based. An excuse for making them word-based is that the data
just happens to be word-indexed. While I cannot support this, I
suppose it is not too terrible because users these days are so
accustomed to word-based searches (because that is what Web search
engines do) that they might be surprised at the results of a
string-based search.
Let me now tell you how I implement things:
1. BODY. I thread through all the MIME parts in the message and
select only those that have a Content-Type of "text" or "message". I
convert each such part to Unicode and then apply the search
criteria. I make no attempt to search parts that would typically be
considered attachments. If, in an HTML part, a phrase being searched
for is broken up by tags, it will not be found. Likewise if it
contains entities. I could do better in this regard and your
bringing up the subject may prompt me to review a number of my own choices.
2. Headers. I unfold headers and normalize everything to Unicode
before searching.
3. xx SEARCH OR OR <exp1> <exp2> <exp3>. I have no idea how safe it
is to use such an expression but my server handles it beautifully.
4. xx SEARCH OR (<exp1> <exp2> <exp3>) exp4. I share your
understanding of this expression.
I also added support for ESEARCH when I did my revamp but have little
idea of how much it gets used.
Pete
At 07:38 PM 4/7/2015, David Harris wrote:
>I'm in the process of completely rewriting the SEARCH logic in my
>IMAP server -
>the old code was done in a hurry and was, quite frankly,
>ridiculously bad, but that's
>another story.
>
>As I get into testing cases, I've come across a number of areas where RFC3501
>and the various sub-documents that I know about are... uh, "vague".
>I'd like to get a
>take on how other implementors view them.
>
>1: BODY: When a SEARCH BODY expression is issued, how should "BODY" be
>interpreted? Is there an assumption that the server should choose the best
>candidate for a displayable message body, parse and normalize it, then search
>that? Or should it simply be taken as a raw scan of the message? How much
>unarmouring and character set normalization is assumed?
>
>2: Headers: when any of the header search expressions is issued, is the
>assumption that the raw header should be searched, or should RFC2047
>encoded-words be reduced and normalized before attempting the comparison?
>
>3: The following search is valid, according to the syntax in RFC3501:
>
> xx SEARCH OR OR <exp1> <exp2> <exp3>
>
>and allows an OR expression to cover three terms instead of just
>two. As such, it
>seems quite useful, but it would certainly have mystified my old
>search code (it was
>rubbish, as I've pointed out), and I was wondering how generally
>safe it would be to
>use this type of expression?
>
>4: I'm pretty sure I'm right on this one, but the following expression:
>
> xx SEARCH OR (<exp1> <exp2> <exp3>) exp4
>
>will only result in a match if either <exp4> is a match, or ALL of
><exp1>, <exp2>
>and <exp3> are a match. Could someone wiser than me confirm this? I'm
>assuming there is no way to perform a search with a long list of OR
>conditions
>without doing a lot of calisthenics on the search string (multiple
>OR conditions
>strung together).
>
>I apologize if any of these are dealt with in RFCs outside RFC3501 -
>I struggle to
>keep track of all the various sub-documents relating to the protocol
>these days.
>
>Thanks in advance for any advice.
>
>Cheers!
>
>-- David --
>
>------------------ David Harris -+- Pegasus Mail ----------------------
>Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
> Phone: +64 3 453-6880 | Fax: +64 3 453-6612
>
>Schoolboy howler for the day:
> "A census taker is the man who goes from home to home
> increasing the population."
>
>
>_______________________________________________
>Imap-protocol mailing list
>Imap-protocol@u.washington.edu
>http://mailman13.u.washington.edu/mailman/listinfo/imap-protocol