[Cryptography] information gain; was: coding for compression or secrecy or both or neither

Wed Jan 21 23:17:39 EST 2015

On Jan 18, 2015, at 5:21 PM, John Denker <jsd at av8n.com> wrote:
>> Note that the modern standard of security is equivalent to what is
>> called real-or-random: the adversary cannot tell if we encrypted a
>> message of his choice or just a random bit string of the same
>> length.
> That's not the correct criterion.  It would be better to
> say the cryptotext does not convey any /information/ about
> the plaintext or the keys.  Here's an example:
> 
> 00010100 10110000 11111110 00110110 10000100 01010111 10001100 10111110
[represented as binary written out in ASCII as 0 or 1 digits]
While valid, this is besides the point.  A common strong definition of semantic security says:  A polynomially-bounded machine will receive as input either (a) k random bits; or (b) the encryption of k' (not necessarily exactly k) known (or chosen) bits with a randomly chosen, unknown key.  The machine must be able to gain no more than an epsilon advantage over a random guess of an answer to the question "What's the source of the following bits?".

There's an implicit assumption that all data is passed in binary.  Sure, if you define your encryption mechanism to output only the ASCII characters 0 and 1, it may be quite secure (and quite compressible), but it will *not* be secure under this definition:  There's a trivial test to determine, with very high probability, whether a string came from this odd encryptor or from a random bit source.  (An even simpler counter-example is to take the encryption and tack on 1000 0 bits - trivially compressible.  Or you could take on 1000 *random* bits, which would not be compressible, and would create a system that would pass the "compare to random" test exactly when the original did - but would add nothing to actual security.)

Do people play "fast and loose" with the term "random"?  Absolutely.  They also play "fast and loose" with other important terms, especially entropy.  If you work with informal definitions, formal results will likely turn out to be false - the first thing you need to get anywhere in formalization is precise definitions.  In fact, it's exactly the existence of counterexamples that "miss the point" that drives the tightening of the definitions.  The counterexamples you give are examples of the costs of fuzzy thinking on mathematical matters, not problems with the underlying exact definitions.

BTW, the one case where there is a legitimate, if small, issue is with a number-theory based cryptosystem like RSA.  Since an encrypted message is a value mod some product for two primes (for RSA; there are similar constraints on other systems), if you represent it in binary with enough bits to cover the possible values, you will necessarily have some bit patterns that don't represent anything.  You can come up with better representations that differ from pure random binary by less, but you can't make the problem go away completely.  A correct definition of semantic security for such a system can readily be set up, and if you want to investigate the semantic security of such a system, you'll need to go through the exercise.
                                                        -- Jerry