wasm-demo/demo/ermis-f/imap-protocol/cur/1600095036.22701.mbox:2,S

MBOX-Line: From blong at google.com  Fri Nov  9 14:05:01 2012
To: imap-protocol@u.washington.edu
From: Brandon Long <blong@google.com>
Date: Fri Jun  8 12:34:49 2018
Subject: [Imap-protocol] Efficiently handling sequence numbers
In-Reply-To: <8560FE55-3E60-4FCF-9B1A-FB0F66D04295@iki.fi>
References: <CABa8R6v2-NW2L4itj30juL6CEFUC3v9pQG0D0zvqM04UNiB+vA@mail.gmail.com>
	<E121DCDC-75FA-43FB-87A7-EA0EEEEA13B0@iki.fi>
	<CABa8R6sSdHCK9ziPrwc=avaJiHjWtfVtgyBBs=kxSzSkkWJQCQ@mail.gmail.com>
	<8560FE55-3E60-4FCF-9B1A-FB0F66D04295@iki.fi>
Message-ID: <CABa8R6tdej9ZrNW6EtBXQwuZupsSKP64Fq9fUhJLtJM4oA7NVQ@mail.gmail.com>

On Fri, Nov 9, 2012 at 1:53 PM, Timo Sirainen <tss@iki.fi> wrote:

> On 9.11.2012, at 23.45, Brandon Long wrote:
>
> > On Fri, Nov 9, 2012 at 1:37 PM, Timo Sirainen <tss@iki.fi> wrote:
> > On 9.11.2012, at 23.15, Brandon Long wrote:
> >
> > > I'm drawing a blank on how to efficiently handle sequence numbers
> without O(n) operations (where n is the number of messages in a folder).
> >
> > I don't think it's a real concern. A long time ago I wrote some kind of
> a binary tree based algorithm to do expunges in O(log n), but eventually I
> just realized it's way too much trouble (and I'm pretty sure it had some
> O(n) operations hidden in there at some intervals). Small fixed size
> records (especially a simple uids[] array) works just fine for probably up
> to tens of millions of mails, and just about no one has that many mails in
> one mailbox.
> >
> > Dovecot nowadays doesn't rewrite the whole index to disk every time you
> delete a mail. It appends "uid 123 expunged" to a log file and updates the
> index in memory (memmove()ing data around). Then once in a while it
> recreates the index. Works nicely. I've wondered about making this two
> arrays though, so that you'd have the uids[] array and other_fixed_data[]
> array and expunge would only need to move data in the uids[] array, but I
> don't know if that really is a good idea. Probably not.
> >
> > Also large mailboxes could be split to "old mails" and "new mails"
> indexes. If you have a huge mailbox, it's very unlikely that any old
> messages get expunged. Then you only need to append to the old index and
> rewrite the small new index.
> >
> > So, do you load the full sequence into memory on SELECT?
>
> Yes, first read/mmap the index snapshot into memory and then apply any
> expunges and other changes to it from the log. It could be optimized so
> that all the expunges are done in one O(n) scan of the index instead of
> multiple separate of memmove()s, but it hasn't been a real problem so far.
> I don't have the old/new index separation yet, but I think that would be a
> good idea to add when the mailbox size grows past maybe 100k messages (or
> maybe even sooner).
>
>
Ah, ok.  If only loading the data was as simple as an mmap for us ;)  At
this point, I think the data only moves through 5 separate servers on 4
machines, usually in the same data center though.  That, and of course we
don't use UIDs as our primary message-id, so we're looking at 32 bits for
the uid and 64 bits for the message-id, so 96 bits minimum per message,
which means a 10M message folder is loading 120MB, probably another 30MB of
overhead in protocol buffers (not really meant for millions of entries).

Brandon
My other computer is a datacenter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20121109/aa6eb548/attachment.html>