<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 3/16/24 06:22, Jerry Leichter wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:7AE2F495-021B-4F19-8DA7-E5FA239DE636@lrw.com">

      <pre class="moz-quote-pre" wrap=""><a class="moz-txt-link-freetext" href="https://arstechnica.com/security/2024/03/hackers-can-read-private-ai-assistant-chats-even-though-theyre-encrypted/">https://arstechnica.com/security/2024/03/hackers-can-read-private-ai-assistant-chats-even-though-theyre-encrypted/</a> describes an attack that reads - actually, guesses with good accuracy - the responses of generative AI programs, even though they are sent through a TLS connection.

[…]

The attack is described (probably even by the authors) as a side-channel attack.  This is *wrong*, and it's exactly the kind of thinking that leads us to overlook these attacks repeatedly.  In the ideal world of math, an encryption algorithm simply maps strings to strings.  That's probably a reasonable description for encrypted data at rest, but it's dead wrong for what's probably the majority of encryption usage today:  Encryption of streams of data in segments delivered over time.  The lengths of those segments aren't "side channel" information - they are part of the data being transmitted.  Those lengths, in many protocols, may even appear explicitly inside the data being transmitted.

Attacks based on characterizing sequences of lengths should be seen as akin to dictionary attacks.  Just as we expect cryptosystems today to resist dictionary attacks (by adding randomness to the encryption thus avoiding encrypting the same data repeatedly), we should expect them to resist attacks against segment lengths.</pre>

    </blockquote>

    <p>Hasn't this been called "traffic analysis", since WW II?<br>

    </p>

    <p>A few years ago put a lot of work (and thought) into building a

      prototype of a system with end-to-end encryption and I considered

      this question. It was a system that reported camera data, from

      cameras that are usually idle. I certainly considered—and

      immediately discarded—the idea of sending a continuous stream of

      data as impractical<i>. But, </i>I didn't ignore the issue

      completely. I aggregated and padded content out to various coarse

      fixed size boundaries before encrypting and sending. I do admit

      that I did not fuzz my padding to variable boundaries. (I think,

      this was a long time ago.) But I also never got past a

      demonstration prototype.<br>

    </p>

    <p>An observer of my system would certainly be able to <i>easily</i>

      measure activity by watching my data flows, but so could an

      observer measuring electricity usage, water or natural gas usage,

      lights visible around edges of window shades, pizza deliveries,

      etc. I considered my vulnerability a known vulnerability,

      knowingly chosen, something to be disclosed. But my padding was

      coarse enough that traffic analysis would reveal nothing specific

      enough to be considered a picture.</p>

    <p>-kb<br>

    </p>

  </body>

</html>