[Cryptography] BBC to deploy detection vans to snoop on internet users

Mon Aug 8 18:29:51 EDT 2016

More recent articles - e.g., "The real scandal is that you still believe TV licence detector vans are real" http://arstechnica.co.uk/tech-policy/2016/08/bbc-tv-licence-vans-wi-fi-snooping-analysis/ - argue that this is all a fake, aimed at frightening people into paying up.

The claim that detectors could work on packet length and timing does bring up a point I've mentioned before:  We define security for ciphers algorithms, and often for modes and even protocols, in a way that completely ignores leakage of message length and timing - assuming that it doesn't really matter very much (if you use a block cipher the length is only know mod 16 or maybe somewhat more, who cares) or, in the case of timing, that this is simply outside the domain of analysis.

And yet we keep seeing attacks (or even proposals for attacks) that look at exactly this "metadata" that we leave unencrypted.  We've seen such things as recovery of encrypted compressed speech based entirely on the sequence of packet lengths; discovery of web pages being read over and HTTPS connection based, again, on the sequence of lengths of messages; a bunch of attacks that make the attacked element an oracle for the "have you seen this string in the data you encrypted recently?" by checking the length of responses.

It's time to take this stuff seriously.

Widely used encryption algorithms and modes guarantee "semantic security" where the semantics is defined by the bits being transmitted.  Blocking, message lengths, and delays between blocks or messages are not considered.  In this situation, it's important that senders avoid coupling sensitive semantics to stuff "outside the semantic envelope".  In particular, compression of data before encryption is a disaster, as it inherently leaks information about the bit-level semantics in the lengths of the messages.  Any non-uniformity in message sizes or sending rates that's tied to the underlying bits similarly moves information from the protected domain to the (deliberately) unprotected one.

If we broaden the definition of "semantic security" to include "the attacker gains no (or a defined, limited amount of) information about message lengths and timings" - can we define cryptosystems that inherently provide such security?  Or do we need to fall back to the old definitions of security that required the sender to follow some rules about message formation?  (For example, not so long ago, ciphers were not secure against known-plaintext attacks, so the rule was that information sent through such a system was *never* released in its original form.  So announcements would be sent to overseas embassies - and then paraphrased before being delivered.)
                                                        -- Jerry