[Cryptography] Reading encrypted generative AI chats

Sun Mar 17 18:55:19 EDT 2024

On 3/16/24 06:22, Jerry Leichter wrote:
> https://arstechnica.com/security/2024/03/hackers-can-read-private-ai-assistant-chats-even-though-theyre-encrypted/  describes an attack that reads - actually, guesses with good accuracy - the responses of generative AI programs, even though they are sent through a TLS connection.
>
> […]
>
> The attack is described (probably even by the authors) as a side-channel attack.  This is *wrong*, and it's exactly the kind of thinking that leads us to overlook these attacks repeatedly.  In the ideal world of math, an encryption algorithm simply maps strings to strings.  That's probably a reasonable description for encrypted data at rest, but it's dead wrong for what's probably the majority of encryption usage today:  Encryption of streams of data in segments delivered over time.  The lengths of those segments aren't "side channel" information - they are part of the data being transmitted.  Those lengths, in many protocols, may even appear explicitly inside the data being transmitted.
>
> Attacks based on characterizing sequences of lengths should be seen as akin to dictionary attacks.  Just as we expect cryptosystems today to resist dictionary attacks (by adding randomness to the encryption thus avoiding encrypting the same data repeatedly), we should expect them to resist attacks against segment lengths.

Hasn't this been called "traffic analysis", since WW II?

A few years ago put a lot of work (and thought) into building a 
prototype of a system with end-to-end encryption and I considered this 
question. It was a system that reported camera data, from cameras that 
are usually idle. I certainly considered—and immediately discarded—the 
idea of sending a continuous stream of data as impractical/. But, /I 
didn't ignore the issue completely. I aggregated and padded content out 
to various coarse fixed size boundaries before encrypting and sending. I 
do admit that I did not fuzz my padding to variable boundaries. (I 
think, this was a long time ago.) But I also never got past a 
demonstration prototype.

An observer of my system would certainly be able to /easily/ measure 
activity by watching my data flows, but so could an observer measuring 
electricity usage, water or natural gas usage, lights visible around 
edges of window shades, pizza deliveries, etc. I considered my 
vulnerability a known vulnerability, knowingly chosen, something to be 
disclosed. But my padding was coarse enough that traffic analysis would 
reveal nothing specific enough to be considered a picture.

-kb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.metzdowd.com/pipermail/cryptography/attachments/20240317/bd3597ff/attachment.htm>