[Cryptography] multi-key encryption of "meta" data

Tue Jul 15 17:03:45 EDT 2014

It seems to me that the binary distinction between "metadata" and
other data is a crock.  As a glaring example of the problem, common 
protocols for encrypted email encrypt only the main body of the 
message, leaving /all/ the headers unencrypted.  This is a serious
security breach, as discussed below [*].

We can do better than this.  We need to do better than this.

At first glance, one might think that "data is data" and we should
not distinguish "metadata" from other data, but actually I wish to
go in the other direction, and distinguish /more/ classes of data.

Use-case scenario:  
  a) I send an email from my desktop.  My mailer needs to know
   the IP address of my mail relay server.

  b) My relay server wants to know my ID and password.  It is not
   an open relay, so it needs to know I am not a spammer.

  c) The relay server also needs to know the IP address of the
   mailhost serving the addressee.

  d) The destination mailhost needs to know the userID or alias
   of the recipient, so that it can deliver the mail to the 
   correct mailbox.

  e) The recipient fetches the mail to his desktop.

  f) The recipient's mailer wants to know things like sender,
   date-and-time, subject, et cetera, since these are displayed
   in the mailbox summary.  The user wants to see these things,
   if only to decide if-and-when he wants to open the mail.

  g) When the recipient decides to read the mail, he needs to
   decrypt the main body of the message.

You can imagine all sorts of more-detailed scenarios -- and also
less-detailed scenarios -- but this suffices to illustrate my 
point.  The point is that at different stages along the way, 
different bits of data need to be known.  It's not even monotonic, 
i.e. not like the layers of an onion or a Russian doll.  The 
authorization needed at step (b) is not needed later, at least 
not in the same form.

To begin to address this, we don't need new cryptological primitives,
just smarter protocols for using the existing primitives.  This
needs more thought, but to get the discussion rolling, here is 
the outline of a simple scheme that might be a step in the right 
direction:

The destination organization publishes a public key.  At the 
outermost layer, I send a plain brown envelope addressed to
"somebody at destination.com".  When this arrives, the mailhost
opens the plain brown envelope and finds that it contains 
another envelope, addressed to a particular person within 
the organization, or perhaps a generic delivery point such 
as "Room 40".

Meanwhile, the recipient has published two public keys, an outer 
key and an inner key.  The secret half of the outer key is known 
to the recipient's mail reader, and is used to decrypt relatively
routine things like sender, subject, date-and-time, et cetera.
The secret half of the inner key is more closely held, and is
only used if-and-when the recipient wishes to decode the main
body of the message.

Let's be clear:  A great deal of the stuff that appears in RFC822 
headers is not needed for delivery of the message, and MUST NOT 
be sent in the clear.

========================
Appendices:

A) It seems to me that STARTTLS operates at not quite the right
level.  For one thing, it only applies to mail traffic.  So at 
the get-go we are surrendering more than we should to traffic
analysis.  It would be better to have something more like IPsec
(but perhaps easier to use) where even the TCP port numbers are
concealed.  Onion routing helps here.  Systematic sending of
/cover traffic/ is also necessary.

B) In case it wasn't obvious:  When I say we should distinguish 
"metadata" from other data, this is not based on US constitutional
law;  I am talking about technology including cryptology.  This
has multiple advantages, including being applicable internationally,
and being more reliable, given a history of (shall we say) spotty
adherence to fourth-amendment principles even within the US.

[*]  As Glenn Greenwald recently noted:
  https://firstlook.org/theintercept/2014/07/11/newly-obtained-emails-contradict-administration-claims-guardian-laptop-destruction/

The US government, when it responds to FOIA requests, generally
blacks out large amounts of metadata.  Quote:

> In justifying its concealments, the administration has the audacity
> to claim that disclosure “would constitute a clearly unwarranted
> invasion of privacy.”

IANAL, and this is not the proper forum to make legal arguments, but
it seems that the USG has well and truly forked itself.  I can just
see the opposing lawyer asking, "Are you lying now, or were you
lying then?  Are you violating the FOIA law, and lying about the
reasons, or were you lying back when you said that hoovering up 
unlimited amounts of metadata was not a violation of the 4th 
amendment?"

The use of the word "unwarranted" is particularly ironic.  Not
undue, not excessive, not improper, but unWARRANTed.

It is amusing to think about the legal argument, but that is not 
really my point.  The point is that 
 a) Leaving so-called "metadata" in the clear “would constitute a 
  clearly unwarranted invasion of privacy”, and
 b) If we solve the problem technologically we don't need to worry
  so much about the chicanery and law-breaking.

To say the same thing another way:  Assume the law of the jungle.
The only privacy rights you have are the ones you can enforce on
your own, using the strength of your cryptography.