[Cryptography] multi-key encryption of "meta" data
John Denker
jsd at av8n.com
Tue Jul 15 17:03:45 EDT 2014
It seems to me that the binary distinction between "metadata" and
other data is a crock. As a glaring example of the problem, common
protocols for encrypted email encrypt only the main body of the
message, leaving /all/ the headers unencrypted. This is a serious
security breach, as discussed below [*].
We can do better than this. We need to do better than this.
At first glance, one might think that "data is data" and we should
not distinguish "metadata" from other data, but actually I wish to
go in the other direction, and distinguish /more/ classes of data.
Use-case scenario:
a) I send an email from my desktop. My mailer needs to know
the IP address of my mail relay server.
b) My relay server wants to know my ID and password. It is not
an open relay, so it needs to know I am not a spammer.
c) The relay server also needs to know the IP address of the
mailhost serving the addressee.
d) The destination mailhost needs to know the userID or alias
of the recipient, so that it can deliver the mail to the
correct mailbox.
e) The recipient fetches the mail to his desktop.
f) The recipient's mailer wants to know things like sender,
date-and-time, subject, et cetera, since these are displayed
in the mailbox summary. The user wants to see these things,
if only to decide if-and-when he wants to open the mail.
g) When the recipient decides to read the mail, he needs to
decrypt the main body of the message.
You can imagine all sorts of more-detailed scenarios -- and also
less-detailed scenarios -- but this suffices to illustrate my
point. The point is that at different stages along the way,
different bits of data need to be known. It's not even monotonic,
i.e. not like the layers of an onion or a Russian doll. The
authorization needed at step (b) is not needed later, at least
not in the same form.
To begin to address this, we don't need new cryptological primitives,
just smarter protocols for using the existing primitives. This
needs more thought, but to get the discussion rolling, here is
the outline of a simple scheme that might be a step in the right
direction:
The destination organization publishes a public key. At the
outermost layer, I send a plain brown envelope addressed to
"somebody at destination.com". When this arrives, the mailhost
opens the plain brown envelope and finds that it contains
another envelope, addressed to a particular person within
the organization, or perhaps a generic delivery point such
as "Room 40".
Meanwhile, the recipient has published two public keys, an outer
key and an inner key. The secret half of the outer key is known
to the recipient's mail reader, and is used to decrypt relatively
routine things like sender, subject, date-and-time, et cetera.
The secret half of the inner key is more closely held, and is
only used if-and-when the recipient wishes to decode the main
body of the message.
Let's be clear: A great deal of the stuff that appears in RFC822
headers is not needed for delivery of the message, and MUST NOT
be sent in the clear.
========================
Appendices:
A) It seems to me that STARTTLS operates at not quite the right
level. For one thing, it only applies to mail traffic. So at
the get-go we are surrendering more than we should to traffic
analysis. It would be better to have something more like IPsec
(but perhaps easier to use) where even the TCP port numbers are
concealed. Onion routing helps here. Systematic sending of
/cover traffic/ is also necessary.
B) In case it wasn't obvious: When I say we should distinguish
"metadata" from other data, this is not based on US constitutional
law; I am talking about technology including cryptology. This
has multiple advantages, including being applicable internationally,
and being more reliable, given a history of (shall we say) spotty
adherence to fourth-amendment principles even within the US.
[*] As Glenn Greenwald recently noted:
https://firstlook.org/theintercept/2014/07/11/newly-obtained-emails-contradict-administration-claims-guardian-laptop-destruction/
The US government, when it responds to FOIA requests, generally
blacks out large amounts of metadata. Quote:
> In justifying its concealments, the administration has the audacity
> to claim that disclosure “would constitute a clearly unwarranted
> invasion of privacy.”
IANAL, and this is not the proper forum to make legal arguments, but
it seems that the USG has well and truly forked itself. I can just
see the opposing lawyer asking, "Are you lying now, or were you
lying then? Are you violating the FOIA law, and lying about the
reasons, or were you lying back when you said that hoovering up
unlimited amounts of metadata was not a violation of the 4th
amendment?"
The use of the word "unwarranted" is particularly ironic. Not
undue, not excessive, not improper, but unWARRANTed.
It is amusing to think about the legal argument, but that is not
really my point. The point is that
a) Leaving so-called "metadata" in the clear “would constitute a
clearly unwarranted invasion of privacy”, and
b) If we solve the problem technologically we don't need to worry
so much about the chicanery and law-breaking.
To say the same thing another way: Assume the law of the jungle.
The only privacy rights you have are the ones you can enforce on
your own, using the strength of your cryptography.
More information about the cryptography
mailing list