wasm-demo/demo/ermis-f/imap-protocol/cur/1600095130.22957.mbox:2,S

MBOX-Line: From tss at iki.fi  Thu Dec 21 15:35:38 2006
To: imap-protocol@u.washington.edu
From: Timo Sirainen <tss@iki.fi>
Date: Fri Jun  8 12:34:38 2018
Subject: [Imap-protocol] Searching
In-Reply-To: <alpine.WNT.0.81.0612201412380.3856@Shimo-Tomobiki.panda.com>
References: <1166603479.22214.298.camel@hurina>
	<alpine.OSX.0.81.0612201018450.10225@pangtzu.panda.com>
	<1166643258.22214.350.camel@hurina>
	<alpine.WNT.0.81.0612201412380.3856@Shimo-Tomobiki.panda.com>
Message-ID: <1166744138.22214.523.camel@hurina>

On Wed, 2006-12-20 at 14:19 -0800, Mark Crispin wrote:
> On Wed, 20 Dec 2006, Timo Sirainen wrote:
> > Optimizing the string search would help some, but for large mailboxes
> > it's still a bit too slow. People want instant search results
> > nowadays. :)
>
> Please define what you mean by "large" and "instant".

Some people would want to see the results as they keep typing the search
keyword. For that kind of a user interface the search can't really take
much longer than 0.1 seconds or it'll look slow.

> It took some effort for me to construct a mailbox that was pathologically
> large enough for a search in UW imapd to take a whole 2 seconds.

But I guess that's for a mailbox that's already in file cache? I'd think
that in a real mail server most users' mailboxes need to be read from
the disk as they're searched, and for a loaded server that can be even
slower. I've also heard of users whose INBOX is over 2 gigabytes..

> > Perhaps. I think it depends on how badly mail admins want it. If it's
> > only a small s/BODY/X-NONEXACT-BODY/ replace for their webmail code,
> > it'll get usage at least within Dovecot community.
>
> By the way, are you doing charset and i18n case-mapping in your
> "non-exact" search?  That, and not the searching, is what takes time.

The "non-exact" naming means just that the text/body searching can
implement different search string matching rules than what IMAP RFC
defines. I couldn't think of a good name for it. Maybe
X-NONRFC-TEXT/BODY :)

But for a standard search, yes, I'm converting mails to UTF-8 before
doing any searching. I should add support for case-insensitive UTF-8
searches also, but for now I'm doing it only for ASCII. No-one's
complained yet though :)

Anyway, yes, I could probably get my standard search code a lot faster
(UW-IMAP searches mboxes 2-3 times faster), but that won't help with
disk I/O usage. Usually there's enough CPU to go around, but not that
much available disk I/O. Indexing helps a lot with that. So it's not
just for bringing down search times from a few seconds to zero, but also
lowering the system load in general.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20061222/07ac7c51/attachment.sig>