towards https everywhere and strict transport security

Nicolas Williams Nicolas.Williams at
Thu Aug 26 02:06:56 EDT 2010

On Thu, Aug 26, 2010 at 12:40:04PM +1000, James A. Donald wrote:
> On 2010-08-25 11:04 PM, Richard Salz wrote:
> >>Also, note that HSTS is presently specific to HTTP. One could imagine
> >>expressing a more generic "STS" policy for an entire site
> >
> >A really knowledgeable net-head told me the other day that the problem
> >with SSL/TLS is that it has too many round-trips.  In fact, the RTT costs
> >are now more prohibitive than the crypto costs.  I was quite surprised to
> >hear this; he was stunned to find it out.

It'd help amortize the cost of round-trips if we used HTTP/1.1
pipelining more.  Just as we could amortize the cost of public key
crypto by making more use of TLS session resumption, including session
resumption without server-side state [RFC4507].

And if only end-to-end IPsec with connection latching [RFC5660] had been
deployed years ago we could further amortize crypto context setup.

We need solutions, but abandoning security isn't really a good solution.

> This is inherent in the layering approach - inherent in our current
> crypto architecture.

The second part is a correct description of the current state of
affairs.  I don't buy the first part (see below).

> To avoid inordinate round trips, crypto has to be compiled into the
> application, has to be a source code library and application level
> protocol, rather than layers.

Authentication and key exchange are generally going to require 1.5 round
trips at least, which is to say, really, 2.

Yes, Kerberos AP exchanges happen in 1 round trip, but at the cost of
requiring a persistent replay cache (and also there's the non-trivial
TGS exchanges as well).  Replay caches historically have killed
performance, though they don't have to[0], but still, there's the need
for either a persistent replay cache backing store or a trade-off w.r.t.
startup time and clients with slow clocks[0], and even then you need to
worry about large (>1s) clock adjustments.

So, really, as a rule of thumb, budget 2 round trips for all crypto
setup.  That leaves us with amortization and piggy-backing as ways to
make up for that hefty up-front cost.

> Every time you layer one communication protocol on top of another,
> you get another round trip.
> When you layer application protocol on ssl on tcp on ip, you get
> round trips to set up tcp, and *then* round trips to set up ssl,
> *then* round trips to set up the application protocol.

See draft-williams-tls-app-sasl-opt-04.txt [1], a variant of false
start, which alleviates the latter.  See also draft-bmoeller-tls-
falsestart-00.txt [2].

Back to layering...

If abstractions are leaky, maybe we should consider purposeful
abstraction leaking/piercing.

There's no reason that we couldn't piggy-back one layer's initial message
(and in some cases more) on a lower layer connection setup message
exchange -- provide much care is taken in doing so.

That's what PROT_READY in the GSS-API is for, that's one use for GSS-API
channel binding (see SASL/GS2 [RFC5801] for one example).  It's what TLS
"false start" proposals are about...  draft-williams-tls-app-sasl-opt-04
gets an up to 1.5 round-trip optimization for applications over TLS.

We could apply the same principle to TCP... (Shades of the old, failed?
transaction TCP [RFC1644] proposal from the mid `90s, I know.  Shades
also of TCP-AO and other more recent proposals perhaps as well.)

But there is a gotcha: the upper layer must be aware of the early
message send/delivery semantics.  For example, early messages may not
have been protected by the lower layer, with protection not confirmed
till the lower layer succeeds, which means... for example, that the
upper layer must not commit much in the way of resources until the lower
layer completes (e.g., so as to avoid DoS attacks).

I'm not saying that piercing layers is to be done cavalierly.  Rather,
that we should consider this approach, carefully.  I don't really see
better solutions (amortization won't always help).


[0] Turns out that there is a way to optimize replay caches greatly, so
    that an fsync(2) is not needed on every transaction, or even most.

    This is an optimization that turned out to be quite simple to
    implement (with much commentary), but took a long time to think
    through.  Writing a test program and then using it to test the
    implementation's correctness was the lion's share of the
    implementation work.

    You can see it here:


    RFE (though IIRC the description is wrong/out of date):



The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at

More information about the cryptography mailing list