[Cryptography] Summary: compression before encryption

Fri Jan 16 16:56:40 EST 2015

 -- replying below to --
From: cryptography [mailto:cryptography-bounces+dennis.hamilton=acm.org at metzdowd.com] On Behalf Of Jerry Leichter
Sent: Friday, January 16, 2015 10:57
To: ianG
Cc: Ray Dillinger; cryptography at metzdowd.com
Subject: Re: [Cryptography] Summary: compression before encryption

[ ... ]

In the usual models, compression before encryption neither adds nor subtracts anything.  We get the same semantic security guarantees either way.

Once you consider broader models in which messages lengths aren't a side channel but are explicitly part of what's presented to the attacker, compression pretty much always helps the attacker, exactly because it converts some kind of semantic information into message sizes.
                                                        -- Jerry

<orcmid>
   That's a nice summary.  
   I have a practical, very real situation where compression is a problem.  This has to do with encryption of document files (so they are highly-amenable to off-line attacks) and the fact that document files tend to be vulnerable to a variety of known-plaintext attacks because of repeatedly-used boilerplate at the tops of files, fixed components of various kinds, etc., and the fact that encryption is after compression.  And then a hash of the (beginning of the) compressed plaintext is also available to the adversary along with various metadata.
   There is, of course, a great deal that can go wrong in the presumed secrecy of the contents of these particular document formats, certainly against a determined adversary.  (Need I say that the encryption-key generation is password-based?)
   One mitigation that has been incorporated in at least one implementation has been the use of chaff.  This chaff is added before compression near the beginning of a plaintext component but happens to be both benign, random, and of random length.  It is also invisible in the document as presented to users and it is not preserved with the unencrypted document. 
   This does not alter the fact that the compression technique has some known structure.  What it does do is make it difficult to determine from the ciphertext length or the disclosed hash and the post-chaffed length that a known plaintext is usable in attacking the encrypted component for discovering the key, and then being able to decrypt other components that have the secret material (and of course the password is itself a great prize since they tend to be both memorable and reused).
   I have been musing over a replacement for encryption of this particular document format that does not deal with separate compressed-and-encrypted components in the package but provides authenticated encryption of the entire Zip package that is the standard wrapper of the document parts.  There the situation is a bit different, since compressed and uncompressed parts are within the Zip and the Zip has a definite structure that can be the basis for attacking the document file (but now it is encrypted).  Now, there is nothing to be done, here, with regard to compression before encryption.  The compression of components in the Zip is unavoidable.  The Zip structure is known.  And significant detail of that structure (such as the names of some of the file parts) are also known and there may be known plaintexts embedded in the overall Zip.  (A Zip package is often itself compressible in its entirety; but we need not go there.)
   I have considered the use of chaff there as well.  In this case, the chaff would be generated as part of a
key expansion procedure and it would be essentially random with variation of lengths and splits between chaff on the front and chaff on the end (because the ends of Zip packages also have important structure and information).  The chaff is removed as part of decryption.  Now, the Zip structure is still in there in the plaintext and I can imagine a search strategy that succeeds in recovering it because there are invariants in the structure that one might actually be able to search and solve for.  The problem might be harder, but there is some concern whether that is enough harder against a determined adversary (and the documents are attackable off-line with all that signifies).
   Well, so long as we're talking about key-stretching, there is another measure that might be more interesting.  That is in generating a permutation rule that applies to the plaintext before encryption, at some granularity of permutation unit.  In effect, the plaintext binary, with or without chaff, is shuffled in some way that breaks the invariants but that, when known, makes it relatively straightforward to re-assemble the correct plaintext after decryption of the authenticated encryption.  I'm interested in such an arrangement that allows random access into the plaintext just the same (a common way of wanting to use the components of a document-format based on a Zip packaging of components).  Even though an adversary still has known structure to dig for, this seems to multiply complexity in more wonderful ways than affixing chaff.
</orcmid>

_______________________________________________
The cryptography mailing list
cryptography at metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography