[Cryptography] "Zoom's end-to-end encryption isn't

Jerry Leichter leichter at lrw.com
Tue Apr 7 10:20:10 EDT 2020

> actually end-to-end at all. Good thing the PM isn't using it for Cabinet calls. Oh, for f..."
> https://www.theregister.co.uk/2020/04/01/zoom_spotlight/
It turns out this report wasn't quite correct - though Zoom's lack of a full description made things worse.  Zoom now has a blog post describing how the system works.  What actually happens is that connections *between endpoints running Zoom software* are actually encrypted end to end.  However, Zoom also supports the ability to bring other kinds of participants in - voice calls, various room conferencing systems, and so on.  These can't support Zoom's encryption - or, in some cases, *any* encryption.  Zoom handles these by using a "connector," which acts as a virtual participant in the end-to-end encrypted network of Zoom clients while also acting as a virtual client for the third-party systems.  The connectors, of course, do run on Zoom's servers.  So as soon as you add any connector to the conversation, you lose the end-to-end property.

Then again, it's hard to see how to avoid that, if you want to support systems that don't otherwise "play the game."

Since the keys used for the end-to-end connections come from the Zoom servers, Zoom *can* decrypt any of these conversations.  Also, since the connectors are completely transparent to participants, the architecture for a law enforcement or other tap is already right there.  Zoom claims that no such tap exists, and they've never been asked to create one; and that, other than in the connectors, they never decrypt the contents of calls.  You can, of course, choose to believe them or not.

Zoom does allow you to run your own servers, which in theory takes their servers out of the picture entirely.  But I'm not sure how much detail they've provided about this.  (In particular, are such servers *really* wholly disconnected from the Zoom infrastructure.)

There are some interesting crypto questions and design issues raised in all this:

1.  A single key is used for all the connections.  Under most circumstances the use of a single key leads to issues.  Here, given the broadcast nature of the conversations - everyone hears everything equally - it's not clear there's a problem.  One exception is when a participant is booted from the conference.  He still has the key and might have other ways to get the contents.  It would probably be good to rekey any time anyone leaves the conference, for whatever reason.
2.  If you kick someone out, you need to rekey over a connection to which the evictee still has the key.  This requires something like authenticated DH.  Probably the right way to do the initial key setup, rather than leaving it up to the server.  Doing n-to-n key setup when not all the parties are present at all times should be doable, but complicated.  Perhaps this starts to overlap with distributed agreement protocols like Lamport's Part Time Parliament.
3.  They probably should signal when a connector joins - or allow you to lock out connectors entirely.  But how many people would understand the implications?  Would this just be yet another little indicator that almost everyone ignores?
4.  They apparently do use AES in ECB mode.  In practical terms, when you are encrypting a compressed video stream ... how much does this really matter?
5.  Related to his:  The connections don't appear to provide authentication.  Since packet loss is to be expected and has to be tolerated in an application like this, the common authenticated modes won't work.  Is there something that will?

We like to say that properly done security is *not* the enemy of good user experience.  (Poorly done security very often is, and people end up working around it.)  Zoom provides a wonderful test case:  How much of the existing really simple and easy to use UI to you have to trade for better security?

                                                        -- Jerry

More information about the cryptography mailing list