286 lines
8.0 KiB
Plaintext
286 lines
8.0 KiB
Plaintext
MBOX-Line: From brong at fastmail.fm Wed Mar 11 14:13:20 2015
|
|
To: imap-protocol@u.washington.edu
|
|
From: Bron Gondwana <brong@fastmail.fm>
|
|
Date: Fri Jun 8 12:34:54 2018
|
|
Subject: [Imap-protocol] If Crispin were creating IMAP today how would
|
|
it be different?
|
|
In-Reply-To: <20041.1426095023@parc.com>
|
|
References: <54FAEB94.4070508@lavabitllc.com> <54FBF289.3010202@psaux.com>
|
|
<7164.1425831184@parc.com>
|
|
<1425907661.1215497.237833469.1EDA571D@webmail.messagingengine.com>
|
|
<6506.1425915329@parc.com> <55005876.4070406@lavabitllc.com>
|
|
<18002.1426091085@parc.com> <55006FA4.3090800@lavabitllc.com>
|
|
<20041.1426095023@parc.com>
|
|
Message-ID: <1426108400.3306698.239091445.5A1246FF@webmail.messagingengine.com>
|
|
|
|
I have a server to server replication protocol for IMAP servers mostly specced out.
|
|
It's based on the current Cyrus server to server replication protocol to a large degree.
|
|
|
|
Sadly I was in the middle of moving back from Norway to Australia when I wrote this:
|
|
|
|
http://lists.andrew.cmu.edu/pipermail/cyrus-devel/2012-December/002703.html
|
|
|
|
And got dragged down other paths. I'm still working on generalising what is in Cyrus
|
|
though. The most tricky bit is agreeing on the format for the checksum over the
|
|
entire mailbox contents which is used as a double-check for consistency after
|
|
applying changes. Without that, it's still pretty good - but there are some issues
|
|
that can go undetected in the case of a split brain.
|
|
|
|
I also have notes from an email I put together after visiting David Carter (author of
|
|
the original replication support in Cyrus) which I'm just going to paste in at the end
|
|
of this message, because I don't appear to have sent them to a public list...
|
|
|
|
I have done some work since then on moving all the extended items into
|
|
namespaced per-message annotations on the wire, so that any server which
|
|
supports annotations (yeah, right - I think there are about 3) can keep information
|
|
with full fidelity even if they don't support a feature themselves.
|
|
|
|
Bron.
|
|
|
|
|
|
Sync Protocol: wire format
|
|
|
|
I spent the afternoon with David Carter and Tony Finch in Cambridge
|
|
looking over how Cyrus currently stores index records and per-mailbox
|
|
data and classifying the fields.
|
|
|
|
Glossary:
|
|
|
|
C Set at create time, immutable after
|
|
D Derived from RFC822 message, immutable
|
|
M Mutable
|
|
I Internal to Cyrus, irrelevant to sync because not exposed
|
|
|
|
cyrus.index records (per message):
|
|
|
|
C UID
|
|
C INTERNALDATE
|
|
D SENTDATE
|
|
D SIZE
|
|
D HEADERSIZE
|
|
D GMTIME
|
|
I CACHEOFFSET
|
|
I LAST_UPDATED
|
|
M SYSTEM_FLAGS
|
|
M USER_FLAGS
|
|
D CONTENT_LINES
|
|
I CACHE_VERSION
|
|
C GUID
|
|
M MODSEQ
|
|
I CACHE_CRC
|
|
I RECORD_CRC
|
|
|
|
Also per-message:
|
|
|
|
M ANNOTATIONS
|
|
C RFC822 message content
|
|
|
|
We talked a lot about GUID. The conclusion was that for a vendor-neutral
|
|
protocol, you want GUID to be an opaque blob of somewhat arbitrary size
|
|
(perhaps the 70 bytes that POP3 UIDL gives - RFC1939).
|
|
|
|
It should not be necessary to have a GUID at all, or even MODSEQs, to use
|
|
this protocol. Just without these things you lose abilities like incremental
|
|
updates and implicit cross-referencing.
|
|
|
|
There are two necessary formats for a record. One is a wire format to
|
|
succinctly describe either a CREATE or UPDATE on a message, and the other
|
|
is a canonical serialisation format to calculate the SYNC_CRC.
|
|
|
|
The ordering of fields is chosen as follows:
|
|
|
|
UID
|
|
MODSEQ
|
|
FLAGS
|
|
INTERNALDATE
|
|
GUID
|
|
ANNOTATIONS
|
|
CRC32
|
|
|
|
The CRC32 buffer format is specified with upper case exact string keys,
|
|
as follows:
|
|
|
|
UID <number>
|
|
MODSEQ <number>
|
|
FLAGS (sorted: <flag>, ...)
|
|
INTERNALDATE <iso8601>
|
|
GUID <astring>
|
|
ANNOTATIONS (sorted: (/name user value user value), ...)
|
|
CRC32 <num>
|
|
|
|
NOTE: the buffer format has a single space rather than endline between each
|
|
key.
|
|
|
|
The sort for FLAGS is purely ASCII byte values. The sort for ANNOTATIONS is
|
|
sorted by name, and within the values by user, with NIL (for shared) sorting
|
|
first.
|
|
|
|
If there is no MODSEQ, then the MODSEQ item is entirely omitted.
|
|
|
|
If there is no GUID, then the GUID item is entirely omitted.
|
|
|
|
The FLAGS () and ANNOTATIONS () items are not included in the CRC32 format if
|
|
they are an empty list.
|
|
|
|
So the most trivial case is:
|
|
|
|
UID <value> INTERNALDATE <value> CRC32 <value>
|
|
|
|
Extended values (like the CID for FastMail's conversations patch) can either
|
|
be added by extending the format, or by creating a synthetic vendor ANNOTATION
|
|
field.
|
|
|
|
CREATES:
|
|
========
|
|
|
|
The create format is precisely like the CRC32 format, except that it may
|
|
contain either a key RFC822 with the entire message, a key XREF with a
|
|
triple (mailboxname uidvalidity uid) or just the GUID and rely on automatic
|
|
linkage to other messages with the same GUID.
|
|
|
|
GUID is defined as being unique to a particular RFC822 message text. It is
|
|
the server's reponsibility to come up with something unique.
|
|
|
|
So CREATE is just the CRC32 format with an additional either RFC822 or XREF
|
|
field to specify the message body. In the sync protocol, it's sender's
|
|
responsibility to ensure that the server already has the XREF'ed message.
|
|
|
|
In the incremental backup case, it's the backup server's responsibility to
|
|
check that the XREF'ed message already exists in the previous backup.
|
|
|
|
UPDATES:
|
|
========
|
|
|
|
To update an existing record, a record with the same UID is created. If
|
|
it contains any CREATION ONLY field, then it's required to match exactly
|
|
(e.g. INTERNALDATE, CRC32 or GUID). If it doesn't, then UID promotion logic
|
|
takes over. It also needs to have a higher MODSEQ value than the previous
|
|
record of course.
|
|
|
|
Otherwise, the format is the same as CREATE.
|
|
|
|
E.g.
|
|
|
|
UID 5 MODSEQ 100 FLAGS (\Seen $foo)
|
|
|
|
If any field is absent, it is unchanged. If the FLAGS () or ANNOTATIONS ()
|
|
lists are present, then they are a SET - changing the replica to contain
|
|
exactly what is in them, removing anything not mentioned.
|
|
|
|
VANISHED:
|
|
=========
|
|
|
|
To deal with cases where the server has forgotten precisely which UIDs were
|
|
removed since the previous MODSEQ value, there needs to be a way to say
|
|
"everything in these ranges of UIDs is was removed".
|
|
|
|
It also needs to be as if it happened at the last known MODSEQ (DELETEDMODSEQ
|
|
in Cyrus terms).
|
|
|
|
The format is
|
|
|
|
VANISHED (AT <modseq> UIDS <range>)
|
|
|
|
e.g.
|
|
|
|
VANISHED (AT 5001 UIDS 1:20,23:25,27:201)
|
|
|
|
This will cause the receiver to expunge any messages matching that UID range,
|
|
and to do so at a MODSEQ of 5001. NOTE: it only makes sense for this modseq
|
|
to be in the future for the receiver, otherwise you would not be sending the
|
|
vanished range, because the receiver would already have seen those changes.
|
|
|
|
---------------
|
|
|
|
That's enough to cover all the interesting cases for messages - now on to
|
|
mailboxes:
|
|
|
|
cyrus.index header
|
|
|
|
I GENERATION
|
|
I FORMAT
|
|
I MINOR_VERSION
|
|
I START_OFFSET
|
|
I RECORD_SIZE
|
|
I NUM_RECORDS
|
|
I LAST_APPENDDATE
|
|
M LAST_UID
|
|
I QUOTA_USED
|
|
M POP3_LAST_LOGIN => metadata
|
|
C UIDVALIDITY
|
|
I DELETED
|
|
I ANSWERED
|
|
I FLAGGED
|
|
M OPTIONS => metadata
|
|
I LEAKED_CACHE
|
|
M HIGHESTMODSEQ
|
|
I DELETEDMODSEQ
|
|
D EXISTS
|
|
I FIRST_EXPUNGED
|
|
I LAST_REPACK_TIME
|
|
I HEADER_FILE_CRC
|
|
D SYNC_CRC
|
|
M RECENT_UID
|
|
I RECENT_TIME
|
|
|
|
mailboxes.db:
|
|
|
|
M ACL
|
|
M SPECIAL-USE => metadata
|
|
|
|
cyrus.header:
|
|
|
|
C UNIQUEID
|
|
M QUOTAROOT ()
|
|
M FLAGS ()
|
|
|
|
annotations.db:
|
|
|
|
M METADATA
|
|
|
|
CREATE format (== CRC32 format - perhaps just XOR with the messages CRC):
|
|
|
|
NAME <name>
|
|
UIDVALIDITY <number>
|
|
HIGHESTMODSEQ <number>
|
|
LAST_UID <number>
|
|
ACL (name value name value)
|
|
QUOTAROOT (root, ...)
|
|
FLAGS (flag, ...)
|
|
METADATA (...)
|
|
|
|
ACLs are sorted by name and normalised by removing cd which got removed in RFC x
|
|
|
|
QUOTAROOTs are sorted by ASCII bytes
|
|
|
|
FLAGS are sorted by ASCII bytes
|
|
|
|
METADATA are sorted and stored just like ANNOTATION:
|
|
|
|
C: a SETMETADATA INBOX (/private/comment "My new comment"
|
|
/shared/comment "This one is for you!")
|
|
S: a OK SETMETADATA complete
|
|
|
|
Becomes:
|
|
|
|
METADATA (/comment (NIL "This one is for you!" brong "My new comment"))
|
|
|
|
Being a shared comment and a private comment for brong.
|
|
|
|
TODO:
|
|
=====
|
|
|
|
* list RFCs next to each item
|
|
* Sieve / Subs / etc - user level stuff
|
|
* Server level annotations
|
|
* Quotaroots
|
|
* define conflict resolution protocols (uid promotion, value resolution,
|
|
rename handling)
|
|
* dates: nail down format YYYY-MM-DDThh:mm:ssZ
|
|
* Conversations extended format
|
|
|
|
--
|
|
Bron Gondwana
|
|
brong@fastmail.fm
|
|
|