[Cryptography] Summary: compression before encryption

Christian Huitema huitema at huitema.net
Fri Jan 16 00:23:02 EST 2015


> Well chosen, yes...
> A compressed file has more "known" structure than just the header.
> A large pile of encrypted files with a known structure and a common key
> could be all that modern tools in Utah need to attack the key.

The real question is whether the compressed file has more or less structure (or entropy) than the original text. In most cases, the compressed file has less structure, and compression before encryption makes the system more robust. If anything, the large pile of files in Utah is considerably smaller if the files have been compressed than if they were left in clear text.

There are two well-known issues, which I could sum up as "SKYPE and CRIME." 

In the Skype attack, the original stream of voice packets has a constant bit rate, but a compressed stream has a variable bit rate, since for example vowels compress more than consonants and silences compress more than anything. In that case, encryption after variable bit rate encryption provides information that is not be present if encryption is applied to the original constant bit rate data. Of course, there is an obvious mitigation, constant bit rate compression, which is indeed what most modern VOIP systems are doing.

In the CRIME attack, the opponent is able to insert data in the original clear text, using various forms of web trickery. Algorithms like GZIP compress better when the same strings are present several times, so each injection can be used as a predictor of the presence of similar data in the original text. If this third party injection is a concern, then the mitigation is to use a "stateless" algorithm, e.g. some form of Huffman coding where the code is derived from analysis of a pre-established thesaurus. This is why HTTP/2.0 uses the HPACK algorithm instead of GZIP.

Of course, if one is really concerned with analysis of message sizes, it is always possible to add some random padding to the compressed messages. This is obviously less efficient than vanilla compression, but probably still more efficient than foregoing compression altogether.

-- Christian Huitema





More information about the cryptography mailing list