[Cryptography] Summary: compression before encryption

Thu Jan 22 07:40:42 EST 2015

On Jan 18, 2015, at 4:55 PM, Ray Dillinger <bear at sonic.net> wrote:
> The idea that modern cryptographers don't use cribs is relevant
> only when the keyspace cannot be reduced enough by mathematics
> to actually begin a search for a key that yields a plaintext.
> ...But when we actually attempt to decrypt an individual message,
> we must recognize the plaintext as being plaintext.
Back in the old days, we generally encrypted letters (and maybe digits) to letters (and maybe digits).  Even then, "recognizing the plaintext as plaintext" was possible without a crib if you assumed the plaintext was a human language.

Today, we encrypt bytes to bytes and if, again, you're assuming a human language - often the case - even a couple of decrypted bytes usually tell you that you've got it wrong.  Compression might help here, but only if the compression isn't known.  Otherwise, instead of "try decryption and see if result is reasonable" is simply replaced by "try decryption, decompress, see if result is reasonable".

In any case, this is not what was meant by a "crib".

> It is also ridiculous to think that with any modern cipher a
> one-bit difference in a megabyte file will yield a completely
> different ciphertext with no way to tell which bit with any
> probability greater than the file length in bits.
That's not what the claim about one bit is about.  If you want semantic security, you can't use a mode like CBC (which doesn't come closet to providing it).  Semantic security requires that you be able to securely transmit a single bit, even in the face of a chosen-plaintext attack.  That immediately implies that encryption with a particular key *cannot be deterministic*.  Otherwise, I as attacker ask you to give me the encryptions of 0 and 1, and later when I want to know if a give ciphertext encrypts 0 or 1 - I just compare the ciphertext sent to my two samples.  To protect against this, the encryption mode must map a given plaintext to a large number of possible ciphertexts - so many that (within the security parameters) an attacker will never see the same particular encryption twice.

I'll certainly grant you that, while semantically secure modes do exist, they are not typically used in practice (because the ciphertext is necessarily larger than the plaintext, which can be inconvenient, for one thing).  Note that random padding provides some degree of semantic security - though if not done carefully, it can create other vulnerabilities.

                                                        -- Jerry