wasm-demo/demo/ermis-f/imap-protocol/cur/1600094998.22592.mbox:2,S

MBOX-Line: From brong at fastmail.fm  Tue Apr  7 17:39:29 2015
To: imap-protocol@u.washington.edu
From: Bron Gondwana <brong@fastmail.fm>
Date: Fri Jun  8 12:34:54 2018
Subject: [Imap-protocol] SEARCH semantics
In-Reply-To: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
References: <55246A71.27553.323D151D@David.Harris.pmail.gen.nz>
Message-ID: <1428453569.748713.250513633.3671D3CF@webmail.messagingengine.com>

On Wed, Apr 8, 2015, at 09:38 AM, David Harris wrote:
> I'm in the process of completely rewriting the SEARCH logic in my IMAP server -
> the old code was done in a hurry and was, quite frankly, ridiculously bad, but that's
> another story.
>
> As I get into testing cases, I've come across a number of areas where RFC3501
> and  the various sub-documents that I know about are... uh, "vague". I'd like to get a
> take on how other implementors view them.
>
> 1: BODY:  When a SEARCH BODY expression is issued, how should "BODY" be
> interpreted? Is there an assumption that the server should choose the best
> candidate for a displayable message body, parse and normalize it, then search
> that? Or should it simply be taken as a raw scan of the message? How much
> unarmouring and character set normalization is assumed?

Cyrus streams each body part through decoding (qp/base64) and charset handing
(generates a stream of int32 unicode codepoints) - which then feeds into the search
engine to look for matches.  If any part matches, then the message matches.

> 2: Headers: when any of the header search expressions is issued, is the
> assumption that the raw header should be searched, or should RFC2047
> encoded-words be reduced and normalized before attempting the comparison?

Likewise - there's a header parser which generates the unicode points for search.

> 3: The following search is valid, according to the syntax in RFC3501:
>
>    xx SEARCH OR OR <exp1> <exp2> <exp3>
>
> and allows an OR expression to cover three terms instead of just two. As such, it
> seems quite useful, but it would certainly have mystified my old search code (it was
> rubbish, as I've pointed out), and I was wondering how generally safe it would be to
> use this type of expression?

Very. That's totally standard, and anything which doesn't support it is totally bogus.

> 4: I'm pretty sure I'm right on this one, but the following expression:
>
>    xx SEARCH OR (<exp1> <exp2> <exp3>) exp4
>
> will only result in a match if either <exp4> is a match, or ALL of <exp1>, <exp2>
> and <exp3> are a match. Could someone wiser than me confirm this? I'm
> assuming there is no way to perform a search with a long list of OR conditions
> without doing a lot of calisthenics on the search string (multiple OR conditions
> strung together).

It's hardly calisthenics, it's just prefix notation.

You can just as well do

OR A OR B OR C D

depending whether you want the tree to bias right or bias left.  Even this is valid

OR OR A OR B C D

As is obvious when you write it out as a tree.

OR
- OR
= - A
= - OR
= = - B
= = - C
- D

> I apologize if any of these are dealt with in RFCs outside RFC3501 - I struggle to
> keep track of all the various sub-documents relating to the protocol these days.
>
> Thanks in advance for any advice.

Cheers,

Bron.

--
  Bron Gondwana
  brong@fastmail.fm