[Cryptography] History and implementation status of Opportunistic Encryption for IPsec

Tue Sep 10 23:01:27 EDT 2013

History and implementation status of Opportunistic Encryption for IPsec

NOTE:	On September 28, there is be a memorial service in Ann Arbour
 	for Hugh Daniel, manager of the old IPsec FreeS/WAN Project.
 	Various crypto people will attend, including a bunch of us
 	from freeswan. Hugh would have loved nothing better than his
 	memorial service being used as a focal point to talk about
 	"new OE", so that's what we will do on Saturday and Sunday.
 	If you are interested in attending, feel free to contact me.

In light of the NSA achievements, a few people asked about the FreeS/WAN
IPsec OE efforts and whatever happened to it.

The short answer is, we failed and got distracted. The long answer follows
below. At the end I will talk about the current plans that have lingered in
the last two years to revive this initiative. Below I will use the word "we" a
lot. Its meaning changes based on the context as various communities touched,
merged, intersected and drifted apart.

OE in a nutshell

For those not familiar with IPsec OE as per FreeS/WAN implementation. When
activated, a host would install a blocking policy for 0.0.0.0/0. Every
packet to an IP address would trigger the kernel to hold the packet and
signal the IKE daemon to go find an IPsec policy for that destination. If
found, the tunnel would be build, and an IPsec tunnel to the remote IP
would be established, and packets would flow. If no policy was found,
a "pass" hole was poked so packets would go out unencrypted. Public
keys for IP addresses were looked up in the reverse DNS by the IKE
daemon based on the destination address. To help with roaming clients
(roadwarriors), initiators could store their public key in their FQDN,
and convey their FQDN as ID when performing IKE so the remote peer could
look up their public key in the forward DNS. This came at the price of
two dynamic clients not being able to do OE to each other. (turns out
they couldn't anyway, because of NAT)

What were the reasons for failing to encrypt the internet with OE IPsec
(in no particular order)

1) Fragmentation of IPsec kernel stacks

In part due to the early history of FreeS/WAN combined with the export
restrictions at the time. Instead of spending more time on IKE and key
management for large scale enduser IPsec, we ended up wasting a lot of
time fixing the FreeS/WAN KLIPS IPsec stack module for each Linux release.
Another IPsec stack, which we dubbed XFRM/NETKEY appeared around 2.6.9 and
was backported to 2.4.x. It was terribly incomplete and severely broken.
With KLIPS not being within the kernel tree, it was never taken into
account.  XFRM/NETKEY remained totally unsuitable for OE for a decade.
XFRM/NETKEY now has almost all functionality needed - I found out today
it shoudl finally have first+last packet caching for dynamic tunnels,
which are essential for OE. Since the application's first packet triggered
the IKE mechanism, the application would start retransmitting before IKE
was completed.  Even when the tunnel finally came up, the application
was usually still waiting on that TCP retransmit.  David McCullough and
I still spend a lot of time fixing up KLIPS to work with the current
Linux kernel. Look at ipsec_kversion.h just to see what a nightmare
it has been to support Linux 2.0 to 2.6 (libreswan removed support for
anything lower then recent 2.4.x kernels)

Linux IPsec Crypto hardware acceleration in practise is only possible
with KLIPS + OCF, as the mainstraim async crypto is lacking in hardware
driver support. If you want to build OE into people's router/modem/setup
box, this is important, though admittingly less so as time has moved on
and even embedded hardware and phones are multicore or have special crypto
CPU instructions.

An effort to make the kernel the sole provider of crypto algorithms that
everyone could use also failed, and the idea was abandoned when CPU crypto
instructions appeared directly accessable from userland.

2) US citizens could not contribute code or patches to FreeS/WAN

This was John Gilmore's policy to ensure the software remained free for
US citizens. If no US citizen touched the code, it would be immune to any
presidential National Security Letter. I believe this was actually the
main reason for KLIPS not going in mainstream kernel, although personal
egos of kernel people seemed to have played a role here as well. Freeswan
people really tried had in 2000/2001 to hook KLIPS into the kernel
just the way the kernel people wanted. (Ironically, the XFRM/NETKEY
hook so bad, it even confuses tcpdump and with it every sysadmin trying
to see whether or not their traffic is encrypted) I still don't fully
understand why it was never merged, as the code was GPL, and it should
have just been merged in, even against John's wishes. Someone would
have stepped in as maintainer - after all the initial brunt of the work
had been done and we had a functional IPsec stack.

In the summer of 2003, I talked to John and together we agreed it was
time to fork. Openswan was born to clearly indicate US coders could
contribute. However, at that point the (then crappy) XFRM/NETKEY IPsec
stack was there to prevent OE from working due to the missing first+last
packet caching. The FreeS/WAN Project ended and Openswan continued. At
first in good pace, but that later slowed down and OE was no longer its
focal point.  (Due to legal reasons, I cannot go into details regarding
the openswan history)

3) Not using DNS without DNSSEC

There were various issues that caused DNSSEC to get massively delayed.
We needed DNSSEC to secure our DNS based distributed public key
platform. Although it would have worked fine to use DNS against passive
attackers (NSA trawling), we believed it was principly wrong to trust
cryptographic material that was untrusted and vulnerable against active
attacks. So while the developers encouraged people to put keys in DNS
even without security, no one else picked it up. It sucks to need to say
'we told you so'. But we should have really not waited on DNSSEC.

3) Dealing with the DNS working groups at IETF

The DNS community is one of the most pedantic group of people I know. They
are very smart, often right, and had been known to be extremely defense
of their DNS turf. (Note that things have improved considerably and if you
think this is still an issue, I'm happy to try and help)

IETF was divided about the convergence of the "security of the DNS"
and the "DNS as PKI" despite that this had always been a goal of DNSSEC
for a large group of people within the IETF. The FreeS/WAN people were
driving DNSSEC not so much for DNS as for the key distribution. After all,
you can detect DNS forging if you know your public keys.

When we had the KEY/SIG records ready to go, it was decreed that it
could only be used for the DNS itself. Applications could not use this
KEY record. To make that distinction more clear, on the next change in
the draft protocol, KEY was obsoleted and DNSKEY introduced.  So IPsec
keys were relegated back to TXT, since at the time we had no Generic
Record format (RFC 3597) support, so waiting for any new RRtype to get
any deployment to become usuable would take years. Almost everyone was on
bind4 and never upgraded left us with no other choice but the TXT. Even
though we wrote the OE and IPSECKEY RFCs, OE's only deployments were
done using TXT records.

4) DNSSEC was delayed by a decade

DNSSEC deployment was slowly gaining traction, but I think we really
needed the Kaminsky bug to get that extra push for DNSSEC outside the geeks
of the IETF. The US government mandate for DNSSEC in .GOV helped as well. 
But by this time, OE was mostly forgotten.

djb repeatedly tried to peddle his own warez. While not at all realistic,
it always gained a lot of hype and media attention and probably did
cause delays of DNSSEC deployment.

Kaminsky himself was shooting down DNSSEC too. I personally heckled
him at various Black Hat's and ICANN conferences until we finally sat
down for a couple of hours to talk about DNSSEC's history and design
goals. I'll claim my 15 minutes of fame for having converted him. It
helped having Kaminsky say that although he didn't like the complexity,
he couldn't see anything better. DNSSEC was needed for everyone.

DNSSEC was gaining traction.  Then we ran into a bunch of DNSSEC
deployment issues. We had the delays due to NSEC vs NSEC3 with OPTIN,
and then on top of that in 2008 when the first big ISP in Sweden
turned on DNSSEC in their resolvers all that traction was blown away.

Most consumer routers ran DNS proxies that implemented DNS as "known
bitstreams" instead of implemeting the actual DNS protocol. The DNSSEC
OK bit caused thousands of routers to drop DNSSEC packets as "invalid
DNS". The only realistic solution: Turn it off and wait two years for
those routers to get obsoleted by faster wifi standards and talk to those
vendors so they would not repeat their mistake with their next generation
of routers.

We now have the IPSECKEY record format (though RFC 4025 is not useful,
see below) and RFC 3597 for the generic DNS record deployed on all DNS
servers. And we're on our way to have DNSSEC on every end node (see
also draft-wouters-edns-tcp-chain-query-00 I just submitted to the IETF)

We have a mostly clean working UDP/TCP port 53 transport for DNSSEC
on most networks (in part thanks to Google DNS). Although our hotspot
handling is still a little rough, with dnssec-trigger the only tool
to hack configurable DNSSEC support into the OS for our coffee shop
visits when we need to rely on forged DNS.

4) When you're NAT on the net, you're NOT on the net.

Opportunistic Encryption relied on a clear peer to peer connection. But
we managed to degrade the internet into servers and clients. NAT was the
biggest problem, and with CGN around the corner, it's not something that
is going away despite IPv6 offering enough IPs for everyone. In fact,
for our "new OE", this is the biggest hurdle to overcome. When Alice
cannot talk to Bob because she cannot reach him due to a (carrier grade)
NAT, we are stuck wildly poking holes and hoping packets flow.

5) The reverse DNS tree is dead Jim

OE depended on the reverse tree as a security mechanism that someone
who was claiming a public key for a specific IP range was actually the
legitimate owner of that IP space. It was the security method for RFC-4025.

But unless you are running in a datacenter, you do not have access to
the reverse DNS. It is useless as key distribtion method. On top of that,
large IPv6 deployments don't even care any more to run any authoritative
DNS for their reverse.

6) BTNS

The IETF tried to revive this OE with the Better Then Nothing Security
("BTNS") working group. Contrary to the name, they also fell into the
"perfect is the enemy of good" trap and most discussion seemed to go into
"channel binding" to upgrade anonymous IPsec to some kind of authenticated
IPsec - at least by the time I became aware of them. In other words,
the most important problem of key distribution was left outside the scope
and no one actually seemed to have implemented anything. Though I have to
admit, I'm behind on reading the VPN auto-discovery drafts. It is just
very discouraging to still be reading problem statement drafts. More
over, I don't think we should setup IPsec tunnels based on packets
hitting the kernel. We have better ways now that we can leverage DNSSEC.

7) We were all complacent

The only interest for IPsec was for corporate VPNs. During the
above listed problem periods, OE people gave up. Some walked away
from IETF. While everyone gained an always-on portable IP device,
their crypto capabilities were practically non-existent. My current
iphone 5 can connect to a corporate VPN, but trying to make it _just_
send out encrypted packets is impossible. Some trickery can be used to
cause almost any packet to setup the VPN, but while that's going on it
is still leaking like a sieve. VPN is seen by phone vendors as a method
to gain some enterprise users, not as the tool to protect the consumer.
The Apple VPN client is a 10+ year old patched version of racoon. The
only vendor that took VPNs seriously was RIM and we punished them by not
buying their products, because we had other priorities like FourSquare,
Facebook and Twitter.

We can only hope that those PRISM players are now put under economic
pressure by frightened consumers to fix this. But as long as VPNs and
DNSSEC is slow and error-prone, it is better for them not to go there.

The New Opportunistic Encryption

I've been brainstorming with various people on how to put IPsec OE back on
the table. I've discussed this with a bunch of people around me, including
the late Hugh Daniel, John Gilmore and Hugh Redelmeier of freeswan.

The packet capturing 0.0.0.0/0 policy is not a good method because
we cannot make any decision on where to find a public key for an IP
address. The reverse is unusable, and IP addresses change often. We used
it because we had nothing better. But now we do. Since every (secure)
platform now runs DNSSEC on the end node, we can use this as our decision
making point. Imagine my phone running a DNSSEC resolver (say unbound)
and an IKE daemon (say libreswan). The DNS server has access to the set
of DNS name and matching IP address. It can lookup the key in the forward
DNS zone, and hand over the public key, dns name and IP address to the IKE
daemon!

1) User tells browser to go to www.cypherpunks.ca
2) browser does a lookup for the A/AAAA record of www.cypherpunks.ca
3) DNSSEC resolver performs the lookup/validation for the A/AAAA record of
    www.cypherpunks.ca and additionally looks up the IPSECKEY record of
    www.cypherpunks.ca.
4a) The resolver will wait with returning the A/AAAA record to the browser
     until it knows if the IPSECKEY record exists or not. If not, it releases
     the A/AAA answer to the application. Packets flow in the clear.
4b) The resolver finds an IPSECKEY record. It sends the pubic key, the FQDN
     and the IP address(es) to the IKE daemon and waits for a response.
     Meanwhile it does _not_ release the A/AAAA record to the application.
5) The IKE daemon sets up the IPsec tunnel. We haven't reached agreement
    yet over how this should be done. There are two choices:

    a) The client uses an "@anonymous" ID for itself along with sending its
    public key inline with IKE. The client is responsible for ensuring there is
    no MITM attack, as it knows the server's public key (from DNSSEC). The
    responding server will just use any key it received inline if it was received
    for the "@anonymous" ID.

    b) The initiator (aka client) uses its own FQDN-based ID. It has
    preconfigured its DNS so that an IPSECKEY record exists for its FQDN
    (protected by DNSSEC). The key is not send inline with IKE. Instead,
    when the responder (aka server) sees the non-anonymous ID, it will perform
    a DNSSEC secured lookup to obtain the IPSECKEY out of band. Both parties
    confirm there is no MITM.

    The advantage of a) is that it leaks less user information and makes tracking
    users harder. The client can regularly generate another anonymous keypair.
    The disadvantage of a) is that it turns peers into clients and servers. And
    two clients cannot initiate OE to each other.

6) The tunnel is established and the IKE daemon notifies the local DNSSEC server
    that had instructed it to setup the IPsec tunnel.

7) The resolver releases the IP address to the application.

8) The applications starts sending packets and the IPsec policy encrypts them al.

I'm personally in favour of the @anonymous solution. But there is no reason why support
for both could not be implemented.

What are some of the obstacles and work to do:

1) writing the unbound plugin 
2) writing the support for @anonymous for the server-side.
    This includes raw keys for IKEv2
    (http://tools.ietf.org/html/draft-ietf-ipsecme-oob-pubkey-00)
3) With NAT, the client suggests an inner-IP. This could be abused or clash,
    We need to 'contain' each connection, possibly using generated ipv6
    addresses 4) We cannot use the "gateway" field of RFC-4025, or people
    could trick a server into giving a client all communication to a certain
    IP address that does not belong to them
5) anonymous connections should generate throw-away keys to remain anonymous
6) implement draft-wouters-edns-tcp-chain or else latency/RTTs will prevent
    real-life deployment of DNSSEC validated IPSECKEYs on mobile devices.
7) This allows no upgrading from anonymous to mutually authenticated, but IKE
    policies can be added to the server/client that would match on different IDs
    (eg X.509) that work independantly of OE without introducing complicated
    channel binding promotion code. Other IKEv2 extensions could possible be
    applied to facilitate promitions.

I'm sure more implementation issues will show up once we get this going,
but there are no real fundamental issues why we cannot deploy this in
a couple of months of time. My plan is to get libreswan to support this
version of OE. Additionally, once we use draft-wouters-edns-tcp-chain,
it becomes cheap to do these lookups through the tor network. If the
tor exit nodes then also feed each other with DNSSEC cache material,
it should make tracing individual clients even harder.

(anyone willing to assist, especially with coding, do contact me)

Paul