The role of compression in cryptosystems

Thu May 31 14:00:22 EDT 2001

At 08:40 AM 5/29/01 -0400, John Denker wrote:
>In a discussion of the role of compression in IPsec,
>at 06:54 PM 5/28/01 -0400, Henry Spencer wrote:

>>The CPU load is quite substantial, and although I don't have recent
>>timings, I wouldn't recommend compression unless you have a slow line.
>
>Hmmmm.  My recommendation would be to shift the burden of proof:  I would 
>leave compression turned on unless there is a good reason to turn it off.
>
>This recommendation is based on the following general principle:
>
>         Any cipher applied to well-compressed data is much
>         harder to cryptanalyze than the same cipher applied to
>         uncompressed plaintext.

This is true, but almost certainly irrelevant.  If you're using triple-DES
or AES, nobody who's talking knows how to attack those ciphers with
available resources, even with adaptive chosen plaintext attacks.  (That
is, attacks where you get to send in a plaintext, get it encrypted and sent
back, and then decide what plaintext to request next.)  The difference in
security between LZW-ish compressor output as plaintext or a steady stream
of binary zeros as plaintext is probably really small.  

Even in the case of DES, a keysearch machine won't be stopped by
compressing the plaintext.  The compressor output has to be decompressable,
and the decompression routine is public.  In the worst case, you can just
try each key for the first few blocks in the message, see which ones are
consistent with compressed ASCII english text (or whatever you know about
the plaintext), and keep checking later blocks only with the subset of keys
that worked on the first blocks.  This will make the keysearch machine a
little more expensive to build, but not all that much more expensive.  At a
guess, you're increasing the cost of the keysearch machine by a factor of
ten or so, which just isn't very relevant.  (This assumes you've done away
with fixed compression headers; otherwise, the keysearch machine is made a
little easier to build.)  

>This principle is based on information theory, specifically the idea of 
>unicity distance.  Note that these days the security of a typical crypto 
>scheme is explained in terms of _complexity theory_ (the assertion that 
>cryptanalysis is computationally infeasible), whereas information theory is 
>a completely different kettle of fish.

Actually, most block ciphers are designed to be efficient, and to resist
various attacks based on statistical properties that propogate too far
through the cipher.  For example, if you take about 2^{43} 64-bit
plaintexts and feed them into DES under a fixed key, you will be able to
detect a very small bias: the parity of a certain subset of input bits can
be used to predict, with a probability just barely greater than 1/2, the
parity of a certain subset of output bits.  (You have to examine about
2^{43} texts because the bias is so small, it takes this many trials to
detect it with high probability.)  This kind of security doesn't have much
to do with complexity theory, and in fact, I can't recall the last time I
saw a block cipher whose security was claimed on complexity theoretic
grounds.  

>And while we're on the subject:  the module that compresses the data should 
>add a goodly percentage of random cryptologic nulls (no, I don't mean ASCII 
>NULs).  The decompressor of course removes them.

So, I think you're spending a lot of resources here that are much better
spent somewhere else.  If your system is currently using single-DES for
encryption, you're far smarter to change over to triple-DES (which will
give you real security) than to try all these games with compression and
adding randomness and such.    

>To decide whether or not this principle is important, we can consider three 
>cases:
>  a) In the case where the cipher is really weak and the adversary has lots 
>of ciphertext to work with, pre-encipherment precautions (such as 
>compressing + adding nulls) won't save the day.  No compressor is perfect, 
>so the unicity distance is never infinite.
>  b) There is a vast middle ground where such precautions can increase the 
>adversary's workload by a large factor (thousands) while increasing the 
>good guy's workload by only slightly.
>  c) In the case where the cipher is really strong, further strengthening 
>is just lily-gilding.

I'd say that the space of (b) is pretty limited.  The problem is, the
receiving end of the normal communications needs to be able to understand
them pretty quickly, and all that is based on public algorithms.  So no
matter what scheme you have for compressing and adding randomness, I can
run publically available algorithms to decompress it and start getting back
plaintext, and for the overwhelming majority of keys in a keysearch
machine, this will give me garbage of one kind or another.  Indeed, IP
needs to be able to handle fairly short packets, so there will be cases
where 20-30 blocks are the whole message; the attacker can wait for such a
packet, and then feed that to his 30-block-at-a-time keysearch machine.  

If that's not true, and your compressor and decompressor are doing
non-public things, then they're part of the cipher, and you've really just
rolled your own new cipher, and are encrypting things twice.  Which is fair
enough, but it's not clear why we ought to do this with your new
compression + nulls cipher, rather than switching from DES to triple-DES,
or doing both triple-DES and AES on the plaintext, or whatever.

...
>Comments, anyone?

--John Kelsey, kelsey.j at ix.netcom.com

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com