wasm-demo/demo/ermis-f/imap-protocol/cur/1600095060.22595.mbox:2,S

286 lines
8.0 KiB
Plaintext

MBOX-Line: From brong at fastmail.fm Wed Mar 11 14:13:20 2015
To: imap-protocol@u.washington.edu
From: Bron Gondwana <brong@fastmail.fm>
Date: Fri Jun 8 12:34:54 2018
Subject: [Imap-protocol] If Crispin were creating IMAP today how would
it be different?
In-Reply-To: <20041.1426095023@parc.com>
References: <54FAEB94.4070508@lavabitllc.com> <54FBF289.3010202@psaux.com>
<7164.1425831184@parc.com>
<1425907661.1215497.237833469.1EDA571D@webmail.messagingengine.com>
<6506.1425915329@parc.com> <55005876.4070406@lavabitllc.com>
<18002.1426091085@parc.com> <55006FA4.3090800@lavabitllc.com>
<20041.1426095023@parc.com>
Message-ID: <1426108400.3306698.239091445.5A1246FF@webmail.messagingengine.com>
I have a server to server replication protocol for IMAP servers mostly specced out.
It's based on the current Cyrus server to server replication protocol to a large degree.
Sadly I was in the middle of moving back from Norway to Australia when I wrote this:
http://lists.andrew.cmu.edu/pipermail/cyrus-devel/2012-December/002703.html
And got dragged down other paths. I'm still working on generalising what is in Cyrus
though. The most tricky bit is agreeing on the format for the checksum over the
entire mailbox contents which is used as a double-check for consistency after
applying changes. Without that, it's still pretty good - but there are some issues
that can go undetected in the case of a split brain.
I also have notes from an email I put together after visiting David Carter (author of
the original replication support in Cyrus) which I'm just going to paste in at the end
of this message, because I don't appear to have sent them to a public list...
I have done some work since then on moving all the extended items into
namespaced per-message annotations on the wire, so that any server which
supports annotations (yeah, right - I think there are about 3) can keep information
with full fidelity even if they don't support a feature themselves.
Bron.
Sync Protocol: wire format
I spent the afternoon with David Carter and Tony Finch in Cambridge
looking over how Cyrus currently stores index records and per-mailbox
data and classifying the fields.
Glossary:
C Set at create time, immutable after
D Derived from RFC822 message, immutable
M Mutable
I Internal to Cyrus, irrelevant to sync because not exposed
cyrus.index records (per message):
C UID
C INTERNALDATE
D SENTDATE
D SIZE
D HEADERSIZE
D GMTIME
I CACHEOFFSET
I LAST_UPDATED
M SYSTEM_FLAGS
M USER_FLAGS
D CONTENT_LINES
I CACHE_VERSION
C GUID
M MODSEQ
I CACHE_CRC
I RECORD_CRC
Also per-message:
M ANNOTATIONS
C RFC822 message content
We talked a lot about GUID. The conclusion was that for a vendor-neutral
protocol, you want GUID to be an opaque blob of somewhat arbitrary size
(perhaps the 70 bytes that POP3 UIDL gives - RFC1939).
It should not be necessary to have a GUID at all, or even MODSEQs, to use
this protocol. Just without these things you lose abilities like incremental
updates and implicit cross-referencing.
There are two necessary formats for a record. One is a wire format to
succinctly describe either a CREATE or UPDATE on a message, and the other
is a canonical serialisation format to calculate the SYNC_CRC.
The ordering of fields is chosen as follows:
UID
MODSEQ
FLAGS
INTERNALDATE
GUID
ANNOTATIONS
CRC32
The CRC32 buffer format is specified with upper case exact string keys,
as follows:
UID <number>
MODSEQ <number>
FLAGS (sorted: <flag>, ...)
INTERNALDATE <iso8601>
GUID <astring>
ANNOTATIONS (sorted: (/name user value user value), ...)
CRC32 <num>
NOTE: the buffer format has a single space rather than endline between each
key.
The sort for FLAGS is purely ASCII byte values. The sort for ANNOTATIONS is
sorted by name, and within the values by user, with NIL (for shared) sorting
first.
If there is no MODSEQ, then the MODSEQ item is entirely omitted.
If there is no GUID, then the GUID item is entirely omitted.
The FLAGS () and ANNOTATIONS () items are not included in the CRC32 format if
they are an empty list.
So the most trivial case is:
UID <value> INTERNALDATE <value> CRC32 <value>
Extended values (like the CID for FastMail's conversations patch) can either
be added by extending the format, or by creating a synthetic vendor ANNOTATION
field.
CREATES:
========
The create format is precisely like the CRC32 format, except that it may
contain either a key RFC822 with the entire message, a key XREF with a
triple (mailboxname uidvalidity uid) or just the GUID and rely on automatic
linkage to other messages with the same GUID.
GUID is defined as being unique to a particular RFC822 message text. It is
the server's reponsibility to come up with something unique.
So CREATE is just the CRC32 format with an additional either RFC822 or XREF
field to specify the message body. In the sync protocol, it's sender's
responsibility to ensure that the server already has the XREF'ed message.
In the incremental backup case, it's the backup server's responsibility to
check that the XREF'ed message already exists in the previous backup.
UPDATES:
========
To update an existing record, a record with the same UID is created. If
it contains any CREATION ONLY field, then it's required to match exactly
(e.g. INTERNALDATE, CRC32 or GUID). If it doesn't, then UID promotion logic
takes over. It also needs to have a higher MODSEQ value than the previous
record of course.
Otherwise, the format is the same as CREATE.
E.g.
UID 5 MODSEQ 100 FLAGS (\Seen $foo)
If any field is absent, it is unchanged. If the FLAGS () or ANNOTATIONS ()
lists are present, then they are a SET - changing the replica to contain
exactly what is in them, removing anything not mentioned.
VANISHED:
=========
To deal with cases where the server has forgotten precisely which UIDs were
removed since the previous MODSEQ value, there needs to be a way to say
"everything in these ranges of UIDs is was removed".
It also needs to be as if it happened at the last known MODSEQ (DELETEDMODSEQ
in Cyrus terms).
The format is
VANISHED (AT <modseq> UIDS <range>)
e.g.
VANISHED (AT 5001 UIDS 1:20,23:25,27:201)
This will cause the receiver to expunge any messages matching that UID range,
and to do so at a MODSEQ of 5001. NOTE: it only makes sense for this modseq
to be in the future for the receiver, otherwise you would not be sending the
vanished range, because the receiver would already have seen those changes.
---------------
That's enough to cover all the interesting cases for messages - now on to
mailboxes:
cyrus.index header
I GENERATION
I FORMAT
I MINOR_VERSION
I START_OFFSET
I RECORD_SIZE
I NUM_RECORDS
I LAST_APPENDDATE
M LAST_UID
I QUOTA_USED
M POP3_LAST_LOGIN => metadata
C UIDVALIDITY
I DELETED
I ANSWERED
I FLAGGED
M OPTIONS => metadata
I LEAKED_CACHE
M HIGHESTMODSEQ
I DELETEDMODSEQ
D EXISTS
I FIRST_EXPUNGED
I LAST_REPACK_TIME
I HEADER_FILE_CRC
D SYNC_CRC
M RECENT_UID
I RECENT_TIME
mailboxes.db:
M ACL
M SPECIAL-USE => metadata
cyrus.header:
C UNIQUEID
M QUOTAROOT ()
M FLAGS ()
annotations.db:
M METADATA
CREATE format (== CRC32 format - perhaps just XOR with the messages CRC):
NAME <name>
UIDVALIDITY <number>
HIGHESTMODSEQ <number>
LAST_UID <number>
ACL (name value name value)
QUOTAROOT (root, ...)
FLAGS (flag, ...)
METADATA (...)
ACLs are sorted by name and normalised by removing cd which got removed in RFC x
QUOTAROOTs are sorted by ASCII bytes
FLAGS are sorted by ASCII bytes
METADATA are sorted and stored just like ANNOTATION:
C: a SETMETADATA INBOX (/private/comment "My new comment"
/shared/comment "This one is for you!")
S: a OK SETMETADATA complete
Becomes:
METADATA (/comment (NIL "This one is for you!" brong "My new comment"))
Being a shared comment and a private comment for brong.
TODO:
=====
* list RFCs next to each item
* Sieve / Subs / etc - user level stuff
* Server level annotations
* Quotaroots
* define conflict resolution protocols (uid promotion, value resolution,
rename handling)
* dates: nail down format YYYY-MM-DDThh:mm:ssZ
* Conversations extended format
--
Bron Gondwana
brong@fastmail.fm