68 lines
3.4 KiB
Plaintext
68 lines
3.4 KiB
Plaintext
MBOX-Line: From blong at google.com Fri Nov 9 14:05:01 2012
|
|
To: imap-protocol@u.washington.edu
|
|
From: Brandon Long <blong@google.com>
|
|
Date: Fri Jun 8 12:34:49 2018
|
|
Subject: [Imap-protocol] Efficiently handling sequence numbers
|
|
In-Reply-To: <8560FE55-3E60-4FCF-9B1A-FB0F66D04295@iki.fi>
|
|
References: <CABa8R6v2-NW2L4itj30juL6CEFUC3v9pQG0D0zvqM04UNiB+vA@mail.gmail.com>
|
|
<E121DCDC-75FA-43FB-87A7-EA0EEEEA13B0@iki.fi>
|
|
<CABa8R6sSdHCK9ziPrwc=avaJiHjWtfVtgyBBs=kxSzSkkWJQCQ@mail.gmail.com>
|
|
<8560FE55-3E60-4FCF-9B1A-FB0F66D04295@iki.fi>
|
|
Message-ID: <CABa8R6tdej9ZrNW6EtBXQwuZupsSKP64Fq9fUhJLtJM4oA7NVQ@mail.gmail.com>
|
|
|
|
On Fri, Nov 9, 2012 at 1:53 PM, Timo Sirainen <tss@iki.fi> wrote:
|
|
|
|
> On 9.11.2012, at 23.45, Brandon Long wrote:
|
|
>
|
|
> > On Fri, Nov 9, 2012 at 1:37 PM, Timo Sirainen <tss@iki.fi> wrote:
|
|
> > On 9.11.2012, at 23.15, Brandon Long wrote:
|
|
> >
|
|
> > > I'm drawing a blank on how to efficiently handle sequence numbers
|
|
> without O(n) operations (where n is the number of messages in a folder).
|
|
> >
|
|
> > I don't think it's a real concern. A long time ago I wrote some kind of
|
|
> a binary tree based algorithm to do expunges in O(log n), but eventually I
|
|
> just realized it's way too much trouble (and I'm pretty sure it had some
|
|
> O(n) operations hidden in there at some intervals). Small fixed size
|
|
> records (especially a simple uids[] array) works just fine for probably up
|
|
> to tens of millions of mails, and just about no one has that many mails in
|
|
> one mailbox.
|
|
> >
|
|
> > Dovecot nowadays doesn't rewrite the whole index to disk every time you
|
|
> delete a mail. It appends "uid 123 expunged" to a log file and updates the
|
|
> index in memory (memmove()ing data around). Then once in a while it
|
|
> recreates the index. Works nicely. I've wondered about making this two
|
|
> arrays though, so that you'd have the uids[] array and other_fixed_data[]
|
|
> array and expunge would only need to move data in the uids[] array, but I
|
|
> don't know if that really is a good idea. Probably not.
|
|
> >
|
|
> > Also large mailboxes could be split to "old mails" and "new mails"
|
|
> indexes. If you have a huge mailbox, it's very unlikely that any old
|
|
> messages get expunged. Then you only need to append to the old index and
|
|
> rewrite the small new index.
|
|
> >
|
|
> > So, do you load the full sequence into memory on SELECT?
|
|
>
|
|
> Yes, first read/mmap the index snapshot into memory and then apply any
|
|
> expunges and other changes to it from the log. It could be optimized so
|
|
> that all the expunges are done in one O(n) scan of the index instead of
|
|
> multiple separate of memmove()s, but it hasn't been a real problem so far.
|
|
> I don't have the old/new index separation yet, but I think that would be a
|
|
> good idea to add when the mailbox size grows past maybe 100k messages (or
|
|
> maybe even sooner).
|
|
>
|
|
>
|
|
Ah, ok. If only loading the data was as simple as an mmap for us ;) At
|
|
this point, I think the data only moves through 5 separate servers on 4
|
|
machines, usually in the same data center though. That, and of course we
|
|
don't use UIDs as our primary message-id, so we're looking at 32 bits for
|
|
the uid and 64 bits for the message-id, so 96 bits minimum per message,
|
|
which means a 10M message folder is loading 120MB, probably another 30MB of
|
|
overhead in protocol buffers (not really meant for millions of entries).
|
|
|
|
Brandon
|
|
My other computer is a datacenter
|
|
-------------- next part --------------
|
|
An HTML attachment was scrubbed...
|
|
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20121109/aa6eb548/attachment.html>
|