Compression side channel

Sat Sep 8 22:45:14 EDT 2001

-----BEGIN PGP SIGNED MESSAGE-----

[ To: Perry's Crypto List ## Date: 09/08/01 07:52 pm ##
  Subject: Compression side channel ]

Guys,

At Crypto this year, I gave a rump session talk on some research I've
been doing (in my copious spare time) on how compression and encryption
might interact to reduce security.  I've been working on a paper (which
is far from ready to be sent out yet), and I wanted to bounce the basic
ideas around the list, for two reasons:

a.  I think this stuff is actually interesting, and maybe relevant for
some real-world systems.  (Though most of the attacks I've been
considering are basically out in academic-land.)

b.  I'm hoping to find out if anyone else has seen similar work
anywhere.  I've not been able to find any references to this kind of
attack, though once you've had the idea to try it, it's really pretty
straightforward.  (And I know there are a couple of occasional posters
on this list who know a heck of a lot more about compression algorithms
than I do.  Peter, are you listening?)

The basic result: Lossless compression algorithms leak data about their
input in the size of their output.  This is really obvious; if some
inputs didn't compress better than others, we'd either have no
compression, or lossy compression.  However, compressors like Zip
deflate and Unix compress maintain state, which is changed as new bytes
of text are processed.  This state basically is used to make future
texts compress better if they're more like recent texts.

This leads to some really powerful attacks, at least in the
chosen-plaintext model.  We have something like

        E(X) = encrypt(compress(X))

where the encryption preserves length (e.g., RC4 encryption).  Suppose
someone is sending a secret S in these messages, and the attacker gets
to choose some prefix or suffix to send, e.g.

X[0] = S+suffix[0]
X[1] = S+suffix[1]
...

Well, if the attacker can watch how these compress, he can learn a *lot*
about these secret strings.  Using a slightly more complicated model for
building the X[i], I've experimentally extracted 16- and 32-digit PINs
from messages, using only a few thousand chosen texts and a pretty small
amount of processing power.  (The code to run the attack is written in
Python, so it really doesn't have a lot of speed.)  The gist of the
attack is that I request a bunch of possible 4-digit subsequences in my
messages, and use how well subsequence+S compresses as a way to place
those subsequences in order of likelihood that they appear in S.  I then
try to piece together high-probability candidate sequences for S, based
on making the subsequences fit together.  I end up with an ordered list
of likely S guesses, and can get the right answer in my top 20
candidates about 60-70% of the time with random PINs for S.

There are a bunch of other attacks along these lines.  For example, I
believe a variant of this attack is possible using known plaintexts
only.  And with a much weaker kind of chosen plaintext access, we can do
"string testing," e.g., if we think we know a long subsequence in S, we
can request E(S+guess), and see what the compression ratio looks like.
And we can watch a compressed and encrypted stream, and use the
bandwidth to determine how much redundancy is in the plaintext.  (At
least, we can determine how much redundancy of a kind the compression
algorithm is designed to detect and use is present in the plaintext.)

Comments?

- --John Kelsey, kelsey.j at ix.netcom.com / jkelsey at certicom.com

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.1 Int.

iQCVAwUBO5rWQyZv+/Ry/LrBAQHowAP/UllCbIhqITK6dYyTgvmsLTlLQUIPGgqf
poVHkwEgKO+SiEF6BC8j+WG32gLKZ5pcbLyXusyOipR5PegrknxRD/yKvw+yuGFo
nWumXyuZ/W4W4mQnISoDgtqi/2Zh+7/3Nph8rllZIWO8z8ooqDgUbPAEMnTxmPAj
/gKeoQzNoto=
=4rOb
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com