[Cryptography] crypto goals and criteria

Tue May 12 16:35:56 EDT 2015

On 05/12/2015 04:27 AM, Salz, Rich wrote:
>> The argument goes that
>> encryption will thwart the censors. Except of course that the encrypted
>> traffic still reveal page lengths, compressed or not...
> 
> Which is why HTTP/2 has padding, and TLS 1.3 will probably have it.
> 
> The IETF isn't "encrypt everything" but rather "pervasive monitoring
> is an attack," with the knowledge that protection of meta-data (DNS,
> padding, timing) is important. There's no guarantee we'll get it
> right, or even if it's possible, but they're trying.

That makes sense.

Here's another way of saying the same thing:
  *) Metadata is data.
  *) A cryptosystem that leaks metadata 
   is a cryptosystem that leaks.
  *) A cryptosystem that leaks when compression is applied
   is a cryptosystem that leaks.
  *) A cryptosystem that leaks when the attacker can 
   inject some known plaintext
   is a cryptosystem that leaks.

Traffic analysis is a Big Deal.  Cryptanalysts have been
using traffic analysis for as long as there's been crypto.

I reckon we will never be able to stop "all" leakage, but
still we have to recognize it for what it is:  leakage.

I see the distinction between metadata and data as (at
best) a legal fiction, created in the US as a way to
get around the 4th amendment (not to mention the 3rd,
9th, and 10th).

By way of contrast:

On 05/12/2015 08:54 AM, John Levine wrote:

>> It would be quite a feat to figure out which Wikipedia page someone
>> was reading just from the page length, compressed or otherwise.
>> There's over 4,800,000 articles each of which can be rendered in many
>> different ways (talk, history, diffs, etc.), they change all the time,
>> and the size of a page depends on whether you're logged in and
>> probably on other stuff.  For example, I just retrieved a Wikipedia
>> page on a topic related to a river in the United States.  The
>> uncompressed length of the page was 47,068 bytes.  Free beer to the
>> first person who figures out what page it was.

That strikes me as naïve.

On a onesie-twosie basis, it would cost more than the price 
of a beer to figure that out.  However, the thought police 
in even a smallish police state are surveilling millions of 
people, and they get to /amortize/ the cost of indexing the
wikipedia.  The scaling behavior is similar to that of a 
dictionary attack.  The cost per victim is negligible.

Sure, /some/ of the articles have changed since yesterday,
but most of them haven't ... and the thought police do
not need to read all of your communications;  a sample
suffices.

Since there are more articles than there are plausible
length values, there will be some collisions ... but the
ambiguities can be resolved by looking at additional 
information not provided in the example above, e.g. the 
pattern of included images, incoming links, outgoing links, 
et cetera ... and/or by statistical inference.  The NSA 
is reeeeally good at statistics.

A dissident could buy a measure of protection by downloading
the entire wikipedia and then referring to the local copy.
This is an example of what we call /cover traffic/.  The 
English-language part is on the order of 10 gigabytes, so 
this is not even particularly expensive:
  lynx -source -head https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
  Content-Length: 11820881800

OTOH it remains a cat-and-mouse game;  cover traffic does
not defeat all avenues of attack.

Ed Snowden said:
  "Encryption works. Properly implemented strong crypto 
   systems are one of the few things that you can rely on. 
   Unfortunately, endpoint security is so terrifically weak 
   that NSA can frequently find ways around it."

I suggest we need to pay more attention to the last part.
Just to give you some idea how hard it will be to fix the
problem, consider the following use-case:

I google for "babe".  The query and the reply are secured
by https.  So far so good.

a) If, however, I click on one of the hits, google will know 
whether I am interested in
 -- mythical oxen
 -- mythical pigs
 -- legendary ballplayers
 -- damsels
 -- or whatever

And (!) if google has it, the government will grab it, without
even a warrant.  According to the 2nd circuit court of appeals,
this is illegal.  I say even if it were legal it would be 
unconstitutional, and even if it were constitutional it would 
be bad policy ... but none of that stops them from doing it.

Here's how google knows:  Even though the text and the
tooltip tell you that the link points to 
  en.wikipedia.org/wiki/Babe_(film)
it doesn't.  Instead it points to something at google.com
that will record your click and then redirect you to the
nominal destination.

b) Furthermore, after redirection the link points to an 
unencrypted http page, even though the corresponding https 
page also exists.  Many months ago google announced that 
they would fix this, i.e. that search results would favor 
the encrypted version when available ... but it hasn't 
actually happened.

On my system, I have workarounds for (a) and (b), but even
so, I don't imagine that my system is secure.  I assume my
machines (including phones) are compromised at every level 
from the firmware on up.  I assume the "Root CA" clown car
is compromised several times over.

Bottom line:
  *) Security requires a lot more than cryptography.
  *) Metadata is data.
  *) A cryptosystem that leaks metadata 
   is a cryptosystem that leaks.