[Cryptography] XORing plaintext with ciphertext

Sat Sep 7 07:25:43 EDT 2013

On Sep 7, 2013, at 4:13 AM, Jon Callas wrote:
>> Take the plaintext and the ciphertext, and XOR them together.  Does the 
>> result reveal anything about the key or the painttext?
> 
> It better not. That would be a break of amazing simplicity that transcends broken. 
The question is much more subtle than that, getting deep into how to define a the security of a cipher.

Consider a very simplified and limited, but standard, way you'd want to state a security result:  A Turing machine with an oracle for computing the encryption of any input with any key, when given as input the cyphertext and allowed to run for time T polynomial in the size of the key, has no more than an probability P less than (something depending on the key size) of guessing any given bit of the plaintext.  (OK, I fudged on how you want to state the probability - writing this stuff in English rather than mathematical symbols rapidly becomes unworkable.)  The fundamental piece of that statement is in "given as input..." part:  If the input contains the key itself, then obviously the machine has no problem at all producing the plaintext!  Similarly, of course, if the input contains the plaintext, the machine has an even easier time of it.

You can, and people long ago did, strengthen the requirements.  They allow for probabilistic machines as an obvious first step.  Beyond that, you want semantic security:  Not only shouldn't the attacking machine be unable to get an advantage on any particular bit of plaintext; it shouldn't be able to get an advantage on, say, the XOR of the first two bits.  Ultimately, you want so say that given any boolean function F, the machine's a postiori probability of guessing F(cleartext) should be identical (within some bounds) to its a priori probability of guessing F(cleartext).  Since it's hard to get a handle on the prior probability, another way to say pretty much the same thing is that the probability of a correct guess for F(cleartext) is the same whether the machine is given the ciphertext, or a random sequence of bits.  If you push this a bit further, you get definitions related to indistinguishability:  The machine is simply expected to say "the input is the result of applying the cipher to some plaintext" or "the input is random"; it shouldn't even be able to get an advantage on *that* simple question.

This sounds like a very strong security property (and it is) - but it says *nothing at all* about the OP's question!  It can't, because the machine *can't compute the XOR of the plaintext and the ciphertext*.  If we *give* it that information ... we've just given it the plaintext!

I can't, in fact, think of any way to model the OP's question.  The closest I can come is:  If E(K,P) defines a strong cipher (with respect to any of the variety of definitions out there), does E'(K,P) = E(K,P) XOR P *also* define a strong cipher?  One would think the answer is yes, just on general principles: To someone who doesn't know K and P, E(K,P) is "indistinguishable from random noise", so E'(K,P) should be the same.  And yet there remains the problem that it's not a value that can be computed without knowing P, so it doesn't fit into the usual definitional/proof frameworks.  Can anyone point to a proof?

The reason I'm not willing to write this off as "obvious" is an actual failure in a very different circumstance.  There was work done at DEC SRC many years ago on a system that used a fingerprint function to uniquely identify modules.  The fingerprints were long enough to avoid the birthday paradox, and were computed based on the result of a long series of coin tosses whose results were baked into the code.  There was a proof that the fingerprint "looked random".  And yet, fairly soon after the system went into production, collisions started to appear.  They were eventually tracked down to a "merge fingerprints" operation, which took the fingerprints of two modules and produces a fingerprint of the pair by some simple technique like concatenating the inputs and fingerprinting that.  Unfortunately, that operation *violated the assumptions of the theorem*.  The theorem said that the outputs of the fingerprint operation would look random *if chosen "without knowledge" of the coin tosses*.  But the inputs were outputs of the same algorithm, hence "had knowledge" of the coin tosses.  (And ... I just found the reference to this.  See ftp://ftp.dec.com/pub/dec/SRC/research-reports/SRC-113.pdf, documentation of the Fingerprint interface, page 42.)

                                                        -- Jerry