89 lines
3.3 KiB
Plaintext
89 lines
3.3 KiB
Plaintext
MBOX-Line: From brong at fastmail.fm Tue Apr 7 17:39:29 2015
|
|
To: imap-protocol@u.washington.edu
|
|
From: Bron Gondwana <brong@fastmail.fm>
|
|
Date: Fri Jun 8 12:34:54 2018
|
|
Subject: [Imap-protocol] SEARCH semantics
|
|
In-Reply-To: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
|
|
References: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
|
|
Message-ID: <1428453569.748713.250513633.3671D3CF@webmail.messagingengine.com>
|
|
|
|
On Wed, Apr 8, 2015, at 09:38 AM, David Harris wrote:
|
|
> I'm in the process of completely rewriting the SEARCH logic in my IMAP server -
|
|
> the old code was done in a hurry and was, quite frankly, ridiculously bad, but that's
|
|
> another story.
|
|
>
|
|
> As I get into testing cases, I've come across a number of areas where RFC3501
|
|
> and the various sub-documents that I know about are... uh, "vague". I'd like to get a
|
|
> take on how other implementors view them.
|
|
>
|
|
> 1: BODY: When a SEARCH BODY expression is issued, how should "BODY" be
|
|
> interpreted? Is there an assumption that the server should choose the best
|
|
> candidate for a displayable message body, parse and normalize it, then search
|
|
> that? Or should it simply be taken as a raw scan of the message? How much
|
|
> unarmouring and character set normalization is assumed?
|
|
|
|
Cyrus streams each body part through decoding (qp/base64) and charset handing
|
|
(generates a stream of int32 unicode codepoints) - which then feeds into the search
|
|
engine to look for matches. If any part matches, then the message matches.
|
|
|
|
> 2: Headers: when any of the header search expressions is issued, is the
|
|
> assumption that the raw header should be searched, or should RFC2047
|
|
> encoded-words be reduced and normalized before attempting the comparison?
|
|
|
|
Likewise - there's a header parser which generates the unicode points for search.
|
|
|
|
> 3: The following search is valid, according to the syntax in RFC3501:
|
|
>
|
|
> xx SEARCH OR OR <exp1> <exp2> <exp3>
|
|
>
|
|
> and allows an OR expression to cover three terms instead of just two. As such, it
|
|
> seems quite useful, but it would certainly have mystified my old search code (it was
|
|
> rubbish, as I've pointed out), and I was wondering how generally safe it would be to
|
|
> use this type of expression?
|
|
|
|
Very. That's totally standard, and anything which doesn't support it is totally bogus.
|
|
|
|
> 4: I'm pretty sure I'm right on this one, but the following expression:
|
|
>
|
|
> xx SEARCH OR (<exp1> <exp2> <exp3>) exp4
|
|
>
|
|
> will only result in a match if either <exp4> is a match, or ALL of <exp1>, <exp2>
|
|
> and <exp3> are a match. Could someone wiser than me confirm this? I'm
|
|
> assuming there is no way to perform a search with a long list of OR conditions
|
|
> without doing a lot of calisthenics on the search string (multiple OR conditions
|
|
> strung together).
|
|
|
|
It's hardly calisthenics, it's just prefix notation.
|
|
|
|
You can just as well do
|
|
|
|
OR A OR B OR C D
|
|
|
|
depending whether you want the tree to bias right or bias left. Even this is valid
|
|
|
|
OR OR A OR B C D
|
|
|
|
As is obvious when you write it out as a tree.
|
|
|
|
OR
|
|
- OR
|
|
= - A
|
|
= - OR
|
|
= = - B
|
|
= = - C
|
|
- D
|
|
|
|
> I apologize if any of these are dealt with in RFCs outside RFC3501 - I struggle to
|
|
> keep track of all the various sub-documents relating to the protocol these days.
|
|
>
|
|
> Thanks in advance for any advice.
|
|
|
|
Cheers,
|
|
|
|
Bron.
|
|
|
|
--
|
|
Bron Gondwana
|
|
brong@fastmail.fm
|
|
|