[Cryptography] Is there a good algorithm providing both compression and encryption at the same time?

Wed May 6 19:08:31 EDT 2015

On May 6, 2015, at 4:15 AM, Francois BERENGER <francois.berenger.fun at gmail.com> wrote:
> While programming an open source distributed system, I discovered
> it could be interesting for the system to use a well established algorithm
> that can do these two things at the same time (for performance reasons).
> 
> I was thinking about something along those lines:
> 
> compression(clear_text) = (compression_dictionary, compressed_text)
> 
> then, instead of encrypting the whole resulting pair (to save some time), we would send over the wire
> 
> (symmetric_encrypt(compression_dictionary, secret_key), compressed_text)
> 
> Of course, I would like that the compressed_text cannot be uncompressed
> by someone who doesn't have access to the compression dictionary.
> 
> I would be interested in scientific articles or any interesting pointer.
I doubt there's much work on something like this because I don't see how it could be very good.  A compressor works by removing redundancy from the input.  The compressed text will reveal where the redundancy was.  For example, suppose you used an LZ-style compressor.  If the input starts with two repeated characters, the encoding will start with two repeated symbols.  Granted, without the dictionary, you can't tell *what* character was repeated - but the fact of the repetition will be right there, and often gives much information away.

In fact, for most common compression algorithms, you don't need the table to know how to divide the compressed data into symbols.  So what you end up with is effectively an old-style code book.  Techniques for breaking those were well developed.  It's true that in the old days, a single codebook was used for all messages (though super-encipherment was common, and was broken); while here there's a new code book for each message.  But individual "messages" today contain as much data - if not more - than might be gathered in years of effort in the old days; so that doesn't help you much.

Perhaps you could tailor an compression algorithm to work in the context, but I doubt it.  Even the obvious and straightforward "compress then encrypt the entire result" leaks information about the lengths of messages - and in some contexts (e.g., compressed speech fragments) that's been found to be enough to completely break the system.  The basic issue is:  Compression turns semantic information into message lengths; and most encryption algorithms leak information about the clear-text message lengths.  So the combination is inherently dangerous.
                                                        -- Jerry