131 lines
4.8 KiB
Plaintext
131 lines
4.8 KiB
Plaintext
MBOX-Line: From dave at cridland.net Mon Jun 29 01:35:47 2015
|
|
To: imap-protocol@u.washington.edu
|
|
From: Dave Cridland <dave@cridland.net>
|
|
Date: Fri Jun 8 12:34:55 2018
|
|
Subject: [Imap-protocol] Parsing, part numbering and BODYSTRUCTURE
|
|
In-Reply-To: <5590E196.25712.A2E3FB@David.Harris.pmail.gen.nz>
|
|
References: <5590E196.25712.A2E3FB@David.Harris.pmail.gen.nz>
|
|
Message-ID: <CAKHUCzxgof+KB7zQDh=5OBNT-5Vgcd1fkmsQDVkbsr3zdkrc8A@mail.gmail.com>
|
|
|
|
On 29 June 2015 at 07:11, David Harris <David.Harris@pmail.gen.nz> wrote:
|
|
|
|
> For reasons that aren't relevant here, I'm in the process of rewriting my
|
|
> MIME
|
|
> parser for about the fifth time in twenty-five years. Each time I do this
|
|
> I find I spend
|
|
> a lot of time trying to reconcile the way I do my parsing with the demands
|
|
> of IMAP. I
|
|
> should probably keep notes each time, but I never do. *sigh*.
|
|
>
|
|
> A lot of the trouble I have comes from the paucity of detail in RFC3501
|
|
> over two key
|
|
> issues - part numbering, and BODYSTRUCTURE. This is not helped by what
|
|
> appears to me to be an erratum - the sample numbering scheme shown on page
|
|
> 56, which appears to suggest that the bare part number for any part of a
|
|
> message
|
|
> references the first byte of the part INCLUDING any MIME headers it might
|
|
> have (if
|
|
> you look at 4.1, it is *followed* by 4.1.MIME, which appears to suggest
|
|
> that the
|
|
> MIME headers are a subset of 4.1).
|
|
>
|
|
> So here's my first question: could someone confirm for me that a bare part
|
|
> number
|
|
> (such as "4.1") refers to the part starting at the first byte *following*
|
|
> the CRLF at the
|
|
> end of its MIME headers?
|
|
>
|
|
>
|
|
Yes. Well, for leaf parts, anyway.
|
|
|
|
|
|
> Next, in a BODYSTRUCTURE, do the line and octet counts for such a part
|
|
> include
|
|
> the MIME headers, or not? I believe the correct answer is "not", but would
|
|
> like to
|
|
> know for sure.
|
|
>
|
|
>
|
|
I'd agree, octet counts there are expected to be those for the part itself
|
|
and not the headers.
|
|
|
|
|
|
> This leads to my next question, which is "is BODYSTRUCTURE reversible"?
|
|
> That
|
|
> is, if you parse a message, build a BODYSTRUCTURE from the parsed data,
|
|
> then
|
|
> re-parse the BODYSTRUCTURE, will the two parses be the same? I have to
|
|
> clarify
|
|
> here, because this question depends on context: if you're parsing for an
|
|
> IMAP
|
|
> server, it's quite reasonable to assume that your parser will build two
|
|
> entries for
|
|
> each part, the first tracking the offset of the MIME headers for the
|
|
> part, the second
|
|
> tracking the offset of the part itself: this allows you to do a simple
|
|
> lookup to satisfy
|
|
> fetches for both <partnumber> and <partnumber>.MIME... Yet it seems to me
|
|
> that
|
|
> you cannot reconstruct this information from a BODYSTRUCTURE - you would
|
|
> lose the offset to the MIME headers. Why am I asking this? I'm trying to
|
|
> work out if
|
|
> it's possible to use BODYSTRUCTURE as a way of storing a parse between
|
|
> invocations, since it's always going to be far quicker to parse a
|
|
> BODYSTRUCTURE
|
|
> than it is to read the entire message again.
|
|
>
|
|
>
|
|
So by "is BODYSTRUCTURE reversible", I thought you meant something else
|
|
entirely.
|
|
|
|
But no, BODYSTRUCTURE itself doesn't contain the offsets into the message,
|
|
and may have normalized other parts of the data. A server would need more
|
|
data, and as I recall it's not quite a superset either - there are items
|
|
you need for the BODYSTRUCTURE which aren't otherwise useful for a server.
|
|
|
|
But - also as I recall - Cyrus IMAP does a single parse which extracts both
|
|
a server-side structure and the BODYSTRUCTURE.
|
|
|
|
|
|
> Finally, is there a detailed discussion of part numbering and BODYSTRUCTURE
|
|
> anywhere? I had a look through the RFC index and couldn't see any other
|
|
> documents that might expand on these subjects, and google didn't yield
|
|
> anything
|
|
> helpful either. And in a similar vein, is there a repository anywhere of
|
|
> sample
|
|
> messages with matching canonical part number listings and bodystructures?
|
|
> This
|
|
> would be extremely helpful in testing parsers and bodystructure generators.
|
|
>
|
|
>
|
|
I vaguely recall a lengthy discussion on a mailing list (either this one or
|
|
imapext) a few years back, but I can't find it immediately either.
|
|
|
|
|
|
> I'm sure this has all been asked a billion times before, and I apologize
|
|
> for that, but
|
|
> any guidance would be gratefully received.
|
|
>
|
|
> Cheers!
|
|
>
|
|
> -- David --
|
|
>
|
|
> ------------------ David Harris -+- Pegasus Mail ----------------------
|
|
> Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
|
|
> Phone: +64 3 453-6880 | Fax: +64 3 453-6612
|
|
>
|
|
> Thought for the day:
|
|
> A diplomat is a man who can convince his wife she'd look
|
|
> stout in a fur coat.
|
|
>
|
|
>
|
|
>
|
|
> _______________________________________________
|
|
> Imap-protocol mailing list
|
|
> Imap-protocol@u.washington.edu
|
|
> http://mailman13.u.washington.edu/mailman/listinfo/imap-protocol
|
|
>
|
|
-------------- next part --------------
|
|
An HTML attachment was scrubbed...
|
|
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20150629/a9412ea3/attachment.html>
|