[Cryptography] Compression before encryption?
leichter at lrw.com
Mon Jan 12 06:40:00 EST 2015
On Jan 12, 2015, at 2:11 AM, grarpamp <grarpamp at gmail.com> wrote:
> It might seem as a result of this thread, that if you own both endpoints
> of the crypted stream, and have validated proper connection to yourself
> such as through certs, that sending CTE bulk data isn't going to be
> an issue.
Sigh. I addressed this earlier.
An ideal cryptosystem leaks no semantic information about the contents of the ciphertext: An attacker's a postiori estimates of the probability that any particular message was sent is identical to his a priori estimates. This is the basic goal we aim for in our encryption primitives.
Now, this goal is rarely attained for the entire system. Unless you continuously send cover traffic, the length of your cryptotext reveals the length of your plaintext. In a real-time protocol, the timing with which you send encrypted packets reveals the timing with which plaintext packets were sent. There's classic spy-story attacks in which an attacker watches the actions of multiple individuals and correlates them with the timing of messages sent, thus learning who is sending the packets.
And that correlation between real-world semantic information and encrypted data sent is exactly the point. Without compression, the length of the ciphertext correlates with the length of the plaintext. *With* compression, the correlation shifts to something much more subtle: To the contents of the plaintext, as it's related to the degree it can be compressed. Whether this matters depends on the overall usage of the combined encryption/compression system. Sure, in many cases you'd say "who cares". *But that misses the point*. The goal of modern cryptography for years has been to provide a system in which the end user *doesn't need to do very subtle analyses of exactly how they use the system.* He just takes his data and sends it. But it's never that simple.
Imagine you're a company that wants to keep your actual sales volumes highly confidential. You invest in the highest level security connections between your offices and between your offices and you suppliers and customers. Knowing that you "chatter" - the number and length of messages sent and received - reveals how much business you're doing, you send cover traffic at all times, keeping the actual transmitted data rates essentially constant.
And then someone introduces a remote backup system. Everything remains encrypted, but it goes over lower-grade commercial lines - no cover traffic. As long as you use a dumb backup system that backs up your entire disk every day, very little information leaks. But then someone gets clever and transmits just the (encrypted) deltas. Suddenly you're leaking something highly correlated with the business done in the previous day.
More information about the cryptography