[Cryptography] Summary: compression before encryption

Fri Jan 16 13:56:33 EST 2015

On Jan 16, 2015, at 10:03 AM, ianG <iang at iang.org> wrote:
> It occurs to me that what we need, and I may be talked out my posterior here, is keyed compression.  If the problem is distinctive header cribs, then cause those headers / first few hundred bytes to be encrypted in some form of light scale encryption, just enough to break any realistic sense of cribs.
Cribs are an old, obsolete concept.  All modern cryptosystems are required to be secure against known, even chosen, plaintext attacks.  We don't accept them otherwise.  Sending a one megabyte message which is entirely known to the attacker except for 1 bit at a particular position in the message is required to be safe, in the sense that the attacker's chance of guessing the bit is unchanged after he's received the message.

Attacks against encrypted compressed data are all side-channel attacks.  They rely on differences in message sizes that correlate with other semantically interesting information in the plaintext - e.g., that some data has been repeated; or that a voice signal isn't an arbitrary audio stream but consists of a small number of semantically meaningful phonemes which are compressed into messages of different lengths, so that reading off the lengths gives you information about the phonemes.

These are side-channel attacks because the underlying design principles either don't consider message lengths at all (they only look at cipher texts in isolation, or as a continuous, unbroken stream) or they say "yes, you can determine the length of the plaintext from the length of the cipher text, we accept that, it's not particularly useful - but that's all you can determine".  Which then fails when those lengths correlate, or can be caused to correlate, with something interesting.

In the usual models, compression before encryption neither adds nor subtracts anything.  We get the same semantic security guarantees either way.

Once you consider broader models in which messages lengths aren't a side channel but are explicitly part of what's presented to the attacker, compression pretty much always helps the attacker, exactly because it converts some kind of semantic information into message sizes.
                                                        -- Jerry