wasm-demo/demo/ermis-f/imap-protocol/cur/1600095061.22701.mbox:2,S

350 lines
16 KiB
Plaintext

MBOX-Line: From slusarz at curecanti.org Mon Nov 26 16:32:24 2012
To: imap-protocol@u.washington.edu
From: Michael M Slusarz <slusarz@curecanti.org>
Date: Fri Jun 8 12:34:49 2018
Subject: [Imap-protocol] Re: Suspend/Restore feature proposal
In-Reply-To: <68035860-387d-43a9-8bb7-00744a7868b9@flaska.net>
References: <20121115215854.Horde.zz6B0O0tmt3ylHiEXzXhQQ4@bigworm.curecanti.org>
<CABa8R6tHP2My0k2LqT1RzHoLQZA+X_jwUMU0cydm2sAwo8f4Fg@mail.gmail.com>
<20121116143137.Horde.7Kb3aEW5DhM6CnuA2hAx4Q8@bigworm.curecanti.org>
<e122fa25-9110-4b36-855d-0e7e273c5805@flaska.net>
<20121121155417.Horde.ZeW7JqTPNxTAI-hTtrAT-Q9@bigworm.curecanti.org>
<68035860-387d-43a9-8bb7-00744a7868b9@flaska.net>
Message-ID: <20121126173224.Horde.BbqbGly8D0JG4aqG7fxoMw1@bigworm.curecanti.org>
Quoting Jan Kundr?t <jkt@flaska.net>:
> On Wednesday, 21 November 2012 23:54:17 CEST, Michael M Slusarz wrote:
>> I would strongly disagree with this statement. As written, the
>> draft is only minimally concerned with saving on network
>> round-trips.
>
> That's quite different from what I've understood from your draft --
> I'd suggest making the motivation clearer, then. But point
> understood, and I've now purged the "let's save roundtrips" from my
> understanding of the draft :). OK.
No need to purge the understanding - saving roundtrips remains a
useful goal. It's just not the primary motivating factor behind the
proposal.
>> $result = $imap->useCompression(true);
>> // Check for success
>> $imap->useQresync(true);
>> // Check for success
>> $imap->setLanguage([LANGUAGE]);
>> // Check for success
>
> It is pretty obvious that if you use synchronous primitives for
> enabling individual sub-features in a serialized fashion, your
> performance will be limited by the round trip times. To put it more
> bluntly, you cannot have code like the one shown above and expect a
> good performace.
>
> Coming from that background, I see that it is tempting to replace
> this endless row of synchronous calls, each enabling a single
> optional feature, with a quick way to side-step this process by
> quickly jumping into a pre-negotiated state where everything which
> was enabled before is enabled now as well. However, my point is that
> clients already exist proving that the same efficiency can be
> achieved with the existing facilities. You're right that this
> requires abolishing the serial, synchronized code, but IMAP is not
> particularly friendly with synchronous APIs.
I realize that the API argument is not my strongest one. It becomes
less strong considering that, yes: you could do all this configuration
in a single API call - i.e., when creating the IMAP interaction
object, you configure everything in there.
I still maintain that writing an API that requires advanced knowledge
of IMAP is not that useful. Things like QRESYNC and LIST-STATUS can
be entirely abstracted so a client coder does not need to know
anything about them to take advantage of.
>> A client may, depending on the capabilities returned, need to
>> perform various internal initialization tasks. For example - if
>> CONDSTORE/QRESYNC is listed, a client may have to then parse a
>> separate configuration file to grab the details of the local cache
>> where it is storing this information, and then connect to this
>> cache, etc.
>
> So you want to keep the cache information (among other things)
> inside some serialized client-side state storage. What prevents you
> from simply checking the capabilities against the previously
> recorded state and restoring the state when the capabilities match
> exactly? You can do that now, without waiting for this extension.
> Yes, it's ugly, but if your initialization is expensive...
Because there's still no guarantee it's the same server/connection:
that is the key to all of this. A server can "look" the same but that
doesn't proves anything.
What happens when the server is upgraded and UTF-8 searching now
works? The CAPABILITY string is exactly the same. But UTF-8 has been
marked as a bad charset so it will still not be available. And what
about those commands that have been determined to be broken previously
in the session? It is reasonable to expect the CAPABILITY string to
be the same between point releases of an IMAP server, but the server
may have fixed the bug that was causing bad command behavior.
>> - Even when pipelining commands, they still need to be sent, the
>> incoming command needs to be tokenized (server), the command is
>> performed (server), the response sent back, any untagged responses
>> are tokenized (client), the untagged responses are interpreted
>> (client), the tagged response is tokenized (client), and the
>> tagged response is processed (client). None of this is "free".
>> Pipelining eliminates none of this.
>
> Using the numbers you posted later on, we're speaking about parsing
> roughly 600 bytes of a well-structured text. For me, it's hard to
> believe that this has any measurable impact.
You are incorrect.
I went ahead and setup some rough/quick benchmarking using current
imapproxy behavior as a proxy for the SUSPEND behavior. In this
benchmark, the server and client are on the same machine so network
latency is assumed to be non-existent. The load on this machine is
also non-existent (this test is the only active IMAP process; disk I/O
is negligible).
Login without resuming session (connecting to a Dovecot 2.1 server)
C: 1 LOGIN [login credentials]
S: 1 OK User logged in
C: 2 CAPABILITY
S: * CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
THREAD=ORDEREDSUBJECT MULTIAPPEND UNSELECT CHILDREN NAMESPACE UIDPLUS
LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES
WITHIN CONTEXT=SEARCH LIST-STATUS SPECIAL-USE ACL RIGHTS=texk
S: 2 OK Capability completed.
C: 3 ENABLE QRESYNC
S: * ENABLED QRESYNC
S: 3 OK Enabled.
Average elapsed time: 0.087 seconds
Login with resuming session:
C: 1 LOGIN [login credentials]
S: * OK [XPROXYREUSE] IMAP connection reused by squirrelmail-imap_proxy
S: 1 OK User logged in
Average elapsed time: 0.039 seconds
Difference: 0.048 seconds (~120% improvement)
120% improvement in a very common example. And a reminder that this
is WITHOUT any network latency; latency would only increase the actual
real-time difference between the benchmarks.
Caveats:
* imapproxy doesn't require you to provide the token before the auth
command, so that is admittedly not accounted for here.
* However, this RESUME could be easily pipelined with the
authentication command, so you are not adding a round-trip.
* Additionally. RESUME shouldn't result in much additional server
load/proccessing since it is doing nothing more than storing the token
in the server's memory - the server isn't going to process that token
until the authentication is complete.
* The above example is being routed through an additional proxy server
so there are small performance penalties there.
* Someone will probably say my code sucks, and that parsing shouldn't
take that long. That could very well be true. But I will note that I
am running this example on a totally unloaded IMAP server with a
single user. The reality is that most IMAP servers are not running on
a box that has 0.00 load.
So the gains for this very simple example are significant - initial
login is twice as fast. A potential savings of 0.10 seconds on a
given connection could easily be possible: there would easily be this
much time savings given network latency from a mobile device, for
example. Given the old Amazon 100ms = 1% study, the theory behind
SUSPEND needs to at least be discussed.
For fun, I also took a look at the performance gains between a
COPY/STORE/EXPUNGE vs. MOVE command. Here I saw ~30% improvement
(0.13 seconds vs. 0.10 seconds). Granted, MOVE is being implemented
to allow for atomicity of the move action, but it is a good comparison.
>> I would argue that the language of the RFC still controls despite
>> what an e-mail on this list says. A client shouldn't be punished
>> for interpreting it that way either.
>
> The RFC is a specification crafted by humans. It has errors, and all
> subsequent revisions will still have errors. (See the errata for a
> list of those which are known already.) If you choose to block and
> not pipeline ENABLE QRESYNC and SELECT ... QRESYNC, you hurt your
> users. (Also note that the clarification given on this list was by
> the original authors of the RFC.)
Yes, but you cited to an e-mail message that said this should be the
case. I hardly feel an IMAP implementer is going to take someone's
opinion in an email as canon.
If this shows up as an errata to RFC 5161, I would tend to agree with
you. But it doesn't at this point.
>>> As of the LANGUAGE -- how often do you expect to hit an error
>>> condition which is not described by an appropriate response code?
>>> I don't think that blocking for its result would be a good design
>>> choice.
>>
>> That could be your decision as a client author. I would vehemently
>> disagree.
>>
>>> And finally, what IMAP servers support the LANGUAGE extension?
>>
>> Why does this matter? RFC 5255 is a Standards Track extension. A
>> year from now, every IMAP server and 200 new ones may support it.
>
> I stand by my reasoning. In order for the block to be actually
> usefull, you'll have to talk to a server which:
>
> 1) actually implements LANGUAGE,
> 2) executes all commands in parallel OR has the LANGUAGE command
> implemented in such a slow way that it enables parallel processing
> for it,
> 3) returns a failure for one of the first commands which you send
> *and* does not return an appropriate response code.
>
> But it's your client, do whatever you want to do :). I'm merely
> saying that adding an extension driven by the desire to eliminate
> issues like this is not something I support.
See benchmarks above. LANGUAGE response is a more complex response
than for ENABLE, so the floor of performance increase is 120%.
>> It would be impossible to determine benchmarks since there is no
>> defined protocol yet. And, as mentioned above, any given
>> client/server interaction may provide different results based on
>> their own internal optimizations and extension support.
>
> Right. Well, based on how my client works, I don't expect any
> significant performance gains obtained through this proposal.
Sure - just like IDLE is completely useless for disconnected clients.
That doesn't make SUSPEND not very useful for at least some clients.
> I'm not the standards commitee, but having decent numbers saying
> "see, this RESUME extensions cuts 40% out of the 1300ms required to
> establish an IMAP session" is something which moves the discussion
> from the current, very vague stage of "this is good -- nope, this is
> worthless" into a stage where we can actually discuss what merits it
> really brings. As you're proposing the extension, you should IMHO
> provide these numbers.
A MOVE saves 30% performance off equivalent commands. SUSPEND, at
least for a simple example, saves 120%. (And see below re: NOTIFY
about something that CAN'T practically be done with current
disconnected clients).
> 1) You don't take the initial CAPABILITY into account, but you
> re-request CAPABILITY after login. (You need the initial capability
> to see whether the server supports RESUME at all.) This will change
> the numbers quite a lot.
What's the point of including benchmarking of the initial CAPABILITY?
Both clients need to do this, so there is no difference - it is no
more expensive for a SUSPEND client than a non-SUSPEND client.
And one of the reasons that I designed the RESUME command as I did is
precisely to address the second part of your comment: the need to
potentially send CAPABILITY pre-login. From an client implementer's
standpoint, it is quite likely that you DON'T need this CAPABILITY so
that is an additional advantage.
Let's assume that your client program has previously connected to a
given IMAP server and executed a successful SUSPEND command. The next
time it connects to the same IMAP server, it has no way of knowing
whether that server is identical pre-authentication. However:
1. Since the previously connected server supports the SUSPEND command,
and it is very likely (although not guaranteed) that the server hasn't
changed in the time since the client last connected, it can be assumed
to a high degree of probability that the server supports SUSPEND.
2. A client using SUSPEND information will know which authentication
method was successful the first time it connected to the server.
Following the logic in #1, it can be assume that the server continues
supports this authentication method.
3. RESUME command doesn't output any response that needs to be parsed
before authentication can occur.
If #1 happens to not be true, this is irrelevant - a client will just
do normal initialization when resuming (the RESUME command would
generate a BAD tagged response, but a client SHOULD ignore this).
If #2 is not true, a client would have sent 2 unnecessary commands but
otherwise, no harm done.
#1 or #2 is an incorrect assumption in, say, 1 out of 100 connections
(which is probably a tremendously conservative example. In a large
webmail installation, with 10,000+ concurrent users, you are getting
millions of connections a day on software that isn't being touched for
several months). Even at this rate, it still makes far more sense to
make these assumptions than 1% of the time sending an additional 2
round-trips.
So a client supporting RESUME will likely save ANOTHER entire
round-trip, so the 100%+ gain listed above is again shown to be a
conservative estimate.
> 2) The sample token which Timo showed on the other list was way
> longer than base64("state token") you use. Just saying.
Sure. But as long as suspend tokens are not approaching 1000+ bytes,
they should comfortably fit into an IP packet so this is irrelevant.
> 3) Saving 600 bytes of transmitted data per connection is noise
> compared to what an actual session typically transfers.
A 50-100ms reduction in connection time is not noise. Maybe it is for
a single client connecting to a single server. But it most certainly
is not for large, distributed systems. This kind of savings can be
the difference between needing to add an additional server to the
backend farm, which may cost a significant amount of money in
hardware/installation/maintenance costs.
> 4) You could save even more bytes by converting IMAP to a binary
> protocol. That possibility in itself is, however, no reason to do so.
I'm not looking to write IMAP 5. I'm looking at a relatively
uncomplicated way to improve performance in IMAP 4.
> 5) You're taking an advantage of eliminating NAMESPACE, but so far
> have ignored LIST and STATUS, even though a typicall client will
> need them as well. When the LIST responses come into account,
> savings of 600 bytes starts looking more and more like noise -- not
> mentioning the mailbox synchronization or data transfers.
Mailbox listing is a very touchy spot for disconnected clients.
Historically, a disconnected client is pretty much stuck with listing
the mailboxes once with the understanding that if another client
changes the mailbox structure there's not much we can do about it
without allowing a user to manually refresh the mailbox list (or
possibly doing something like polling the mailbox list at a given time
interval).
However, as Timo noted, SUSPEND potentially allows disconnected
clients to take advantage of NOTIFY. Which would be a gigantic gain.
With the combination of the two, disconnected clients could
potentially have the equivalent of QRESYNC for mailbox lists, which is
a feature that doesn't currently exist. No amount of pipelining is
going to fix this.
Additionally, this behavior makes SUSPEND useful for connected clients
if such client locally caches mailbox lists: a desktop client that
opens a second or two faster due to the fact that LIST's don't need to
be sent is a substantial UI improvement.
In other words, SUSPEND brings real-world performance improvements and
provides multiple features that are not possible with current IMAP
protocol/extensions.
Once again, thanks for the comments.
michael