[Cryptography] Compression before encryption?

John Denker jsd at av8n.com
Fri Jan 9 15:09:47 EST 2015


On 01/09/2015 05:22 AM, Stephan Neuhaus wrote:

> I have come across the recommendation to "compress before you
> encrypt", 

OK.  Sometimes it helps, sometimes it doesn't.

Let's assume the plaintext /is compressible/. Otherwise there's 
nothing to discuss.  If the data is already random, or already
well compressed, further compression is obviously not going 
to help.

> on the grounds that this makes plaintext recognition
> through frequency analysis much harder.

It does make frequency analysis much harder, although
that's not necessarily the only way to think about it.
 
> However, compression algorithms surely have easily recognisable
> headers, right? 

a) That's true only if it is a lousy compression algorithm.
 If the headers are predictable, the headers themselves can
 be compressed.

b) A /small/ amount of predictability is more survivable than
 a large amount.

> [...] is
> there any other hard evidence (not personal opinion, however
> well-informed, please) that compression before encryption does or
> does not help?

Sometimes it increases security, and sometimes it doesn't.

 -- If you are breaking RSA by factoring, you derive no
  advantage from a known plaintext.

 -- For a fully-known plaintext, the compressed version is
  also fully known, just smaller.  This may or may not be
  significant to the attacker.  For super-long *or* super-
  short messages it's probably not going to matter.  There 
  is a Goldilocks zone in between.

 -- At one opposite extreme, if you are breaking Enigma, or 
  breaking WEP, /partially/ known i.e. partially predictable
  plaintexts are a big deal.  Compression ideally removes 
  the predictable stuff, making cryptanalysis very much
  harder.

 -- As an intermediate case, compression might create a
  situation where the message was not breakable by itself,
  but breakable if the same message were transmitted to
  multiple stations under different keys.  This exploits
  /same/ plaintext, not "known" plaintext.  Compression
  doesn't change the sameness.  This is why you need
  session keys.

 -- At the other opposite extreme, if chosen plaintext can
  be concatenated to the unknown plaintext, compression
  might make things very much worse:
    http://en.wikipedia.org/wiki/CRIME

  Note that this magnifies and highlights the problem with
  traffic analysis, which is an oft-underappreciated problem
  whether or not there is compression involved.


More information about the cryptography mailing list