[Cryptography] Reading encrypted generative AI chats

Tue Mar 19 07:55:22 EDT 2024

While I don’t disagree with the comments here - and I note in particular Kent Boyd’s comment about the interaction between cryptography and the “separation of concerns” implied by the OSI networking layers - I think there’s a point being missed.  Yes, traffic analysis has been around for a long time; yes, the line between data and metadata gets arbitrarily vague; yes, defeating these kinds of attacks is very difficult.  But what we’re really seeing here is another kind of “separation of concerns” that itself causes us to miss problems - and maybe solutions.

Consider the history of modeling of crypto systems, especially of block encryption which is pretty much universal these days.  We used to simply consider the system as a pair of functions E() and D() on some A^n, where A was the alphabet and n was the block size.  Then we gradually learned that really we need to encrypt things in A*, and that required modes - and modes were much more complex and important than we thought.  (Yes, we also learned about the importance of authentication and the usefulness of associated data and such, but that’s not relevant here.)  While we still write E() and D() for the fundamental block algorithms, when we specify a system, we study the modes as well.

To deal with the issues I’m talking about, I’ll propose a different formalism.  Define a segment as a triple [S,l,p] where S is a member of A*, l is the length of S as an integer, and p is a non-negative real number (represented to some fixed precision) that is the “pause” after the segment.  (It could be the pause before the segment just as well - it doesn’t matter.)  The “extended encryption” EE([S,l,p]) = [E(S), L(l), P(p))], where L() and P() are some functions from integers to integers and non-negative reals to themselves respectively.  For physical reasons, L() and P() are assumed to be non-decreasing.

We apply EE to sequences of triples; their decryption is simply the sequence of strings D(E(S)).  The decryption has access to L(l) and P(p) but doesn’t need to recover l and p.

Today, we would have L() be something like rounding up to the block size, and P() is just the identity function.

An attacker has access to the encrypted triples.  An attack is successful if it gains any advantage in estimating the probability of the cleartext over the a priori distribution of the cleartext.

In this setting, attacks based on using information from L(l) and P(p) are not “side channel” attacks; they are using what’s explicitly part of the full “message" being transmitted.  Which is exactly the point:  To bring information that is inherently available anyway into the rubric.  Things like power analysis remain side channel attacks because they rely on things “outside the system”:  Important, but not always available to typical attackers.

Traffic analysis may or may not be an attack within this model.  (It often is, but in many cases the kind of information it gives you is hard to represent this way.  For example, it might change your estimate of the a priori distribution, without in and of itself giving you any advantage with respect to any particular message.  But there are classic examples like recognizing which of multiple stations is the “commander” without being able to get any information about the commands and responses being sent.)
                                                        -- Jerry