68 lines
3.4 KiB
Plaintext
68 lines
3.4 KiB
Plaintext
MBOX-Line: From David.Harris at pmail.gen.nz Sun Jun 28 23:11:34 2015
|
|
To: imap-protocol@u.washington.edu
|
|
From: David Harris <David.Harris@pmail.gen.nz>
|
|
Date: Fri Jun 8 12:34:55 2018
|
|
Subject: [Imap-protocol] Parsing, part numbering and BODYSTRUCTURE
|
|
Message-ID: <5590E196.25712.A2E3FB@David.Harris.pmail.gen.nz>
|
|
|
|
For reasons that aren't relevant here, I'm in the process of rewriting my MIME
|
|
parser for about the fifth time in twenty-five years. Each time I do this I find I spend
|
|
a lot of time trying to reconcile the way I do my parsing with the demands of IMAP. I
|
|
should probably keep notes each time, but I never do. *sigh*.
|
|
|
|
A lot of the trouble I have comes from the paucity of detail in RFC3501 over two key
|
|
issues - part numbering, and BODYSTRUCTURE. This is not helped by what
|
|
appears to me to be an erratum - the sample numbering scheme shown on page
|
|
56, which appears to suggest that the bare part number for any part of a message
|
|
references the first byte of the part INCLUDING any MIME headers it might have (if
|
|
you look at 4.1, it is *followed* by 4.1.MIME, which appears to suggest that the
|
|
MIME headers are a subset of 4.1).
|
|
|
|
So here's my first question: could someone confirm for me that a bare part number
|
|
(such as "4.1") refers to the part starting at the first byte *following* the CRLF at the
|
|
end of its MIME headers?
|
|
|
|
Next, in a BODYSTRUCTURE, do the line and octet counts for such a part include
|
|
the MIME headers, or not? I believe the correct answer is "not", but would like to
|
|
know for sure.
|
|
|
|
This leads to my next question, which is "is BODYSTRUCTURE reversible"? That
|
|
is, if you parse a message, build a BODYSTRUCTURE from the parsed data, then
|
|
re-parse the BODYSTRUCTURE, will the two parses be the same? I have to clarify
|
|
here, because this question depends on context: if you're parsing for an IMAP
|
|
server, it's quite reasonable to assume that your parser will build two entries for
|
|
each part, the first tracking the offset of the MIME headers for the part, the second
|
|
tracking the offset of the part itself: this allows you to do a simple lookup to satisfy
|
|
fetches for both <partnumber> and <partnumber>.MIME... Yet it seems to me that
|
|
you cannot reconstruct this information from a BODYSTRUCTURE - you would
|
|
lose the offset to the MIME headers. Why am I asking this? I'm trying to work out if
|
|
it's possible to use BODYSTRUCTURE as a way of storing a parse between
|
|
invocations, since it's always going to be far quicker to parse a BODYSTRUCTURE
|
|
than it is to read the entire message again.
|
|
|
|
Finally, is there a detailed discussion of part numbering and BODYSTRUCTURE
|
|
anywhere? I had a look through the RFC index and couldn't see any other
|
|
documents that might expand on these subjects, and google didn't yield anything
|
|
helpful either. And in a similar vein, is there a repository anywhere of sample
|
|
messages with matching canonical part number listings and bodystructures? This
|
|
would be extremely helpful in testing parsers and bodystructure generators.
|
|
|
|
I'm sure this has all been asked a billion times before, and I apologize for that, but
|
|
any guidance would be gratefully received.
|
|
|
|
Cheers!
|
|
|
|
-- David --
|
|
|
|
------------------ David Harris -+- Pegasus Mail ----------------------
|
|
Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
|
|
Phone: +64 3 453-6880 | Fax: +64 3 453-6612
|
|
|
|
Thought for the day:
|
|
A diplomat is a man who can convince his wife she'd look
|
|
stout in a fur coat.
|
|
|
|
|
|
|
|
|