[Cryptography] Is there a good algorithm providing both compression and encryption at the same time?

Fri May 8 06:50:23 EDT 2015

On May 7, 2015, at 8:15 PM, Ray Dillinger <bear at sonic.net> wrote:
>>> I would be interested in scientific articles or any interesting pointer.
>> I doubt there's much work on something like this because I don't see how it could be very good.  
>> A compressor works by removing redundancy from the input.  The compressed text will reveal where the redundancy was.  
> ...[I]f compression is perfect, then the
> compressed text, like encrypted text, is indistinguishable
> from random noise.  A compressor, after all, works by removing
> redundancy from the input.  When there is no redundancy there
> is no way to tell whether an equiprobable sequence of bits
> means one thing or another thing -- only that both things were
> within 50% of equally likely to be expressed because they both
> mapped to the same bit length.
There's a fundamental problem with this argument:  In the case of compression, there's no secret information.  Anyone can run the decompression algorithm.  The output may "look like noise", but it's readily distinguishable from noise by anyone.  (This is different from compression, where it's distinguishable from noise by an opponent who has the key, subject to whatever security constraints apply, e.g., polynomial bound on opponent time.)  The original proposal being discussed kept some of the information from the opponent - i.e., it assumed a compression that used a dictionary which was itself encrypted, so for simplicity we assume it's just not present.  But as the example I gave showed, that failed to hide some of the information - e.g., in an LZ-style compression, the exact pattern of repetitions of previous strings is revealed, even if the opponent doesn't know what strings are being repeated.  Still, given some known (or guessed) ciphertext, this is quite damaging.  (And it's certainly a good *distinguisher* from random noise.  I don't know what the output of an LZ encoder on random input looks like, but it's certainly going to have characteristics far from, say, English texts.)

> ...[W]hile given a practical, not-very-good
> compression function that symbol-repetition information may
> be available in the compressed text, it _certainly_ won't
> be available in the encrypted compressed text unless your
> encryption is completely broken when considered separately
> from your compression.
Err... did you read the message you're responding to, and the message *it* was responding to?  Why would you think I was claiming it was?

> There is absolutely nothing wrong with the "compress then
> encrypt" construction. As folks have pointed out, you must
> never rely on imperfect compression to SERVE AS encryption,
> but encrypting compressed text using a good encryption
> primitive is not worse at hiding its contents than
> encrypting uncompressed text using the same good encryption
> primitive.
This is provably incorrect, in some cases.  I cited a published example (http://cs.unc.edu/~fabian/papers/tissec2010.pdf):  If you take a constant-rate stream of audio samples of speech and run it through any decent encryption, it's secure.  If you first compress the speech using one of the state-of-the-art algorithms designed for this purpose, you can get a factor of 8 or even 16 compression.  The result is a series of varying-length packets whose inter-packet timing is also important.  It's been experimentally verified that just the timing and the lengths of the packets is enough to extract some of the original speech.  So even encrypting such a stream with a one-time-pad is completely ineffective.  Sure, you can insert random filler to hide the packet lengths and inter-packet gap lengths - but then you're throwing away what you gained in compression.

There are cases where compress-before-encrypt is fine.  But there's no general result of this form, nor can there be unless you first reduce your model to that of simple mathematical transformations, ignoring the physical realities of transmission and the side-channel information they inherently leak.

> We usually accept the opponent knowing approximately how long
> a message is because the encrypted message is that long plus
> the length of a known-length IV and rounded up to a block
> boundary.  So I'm not concerned that message length information
> may be only as well hidden as it is with any kind of encryption.
It's good to know that leaked message lengths are fine for you.  I guess I won't be buying your "encrypted conversation" app....  :-)

                                                        -- Jerry