[Cryptography] OpenSSL CSPRNG work

Wed Jun 28 20:04:48 EDT 2017

On Wed, Jun 28, 2017 at 3:36 PM, iang <iang at iang.org> wrote:
>
> I would say precisely the reverse.  If you're a security expert you have a
> good chance in hell of doing a good PRNG.  If you're a user you have no
> chance.
>

/dev/urandom is far too slow for something like OpenSSL.
getrandom()/getentropy() are faster, but still slower than can be done
in-process. While "use use /dev/urandom" is good advice, that's not a good
trade-off imo for a project that is already heavily focused on
cryptography.

For example, If you're doing something like TLS1.1, and generating a random
IV for every CBC record sent, then this is very significant. Or if you're
doing a lot of ECDSA is another case.  There can also be quality issues
with the implementations of /dev/urandom on some platforms; for example if
you desire prediction resistance, not all /dev/urandom implementations
provide it.

This was the motivation for including our own in-process DRBG in s2n. We
want extreme performance, and also don't want to outsource so critical a
part of our security model. In case it's useful, here's the design we use
and some discussion on why:

In s2n, every every thread gets two RNGs:

  https://github.com/awslabs/s2n/blob/master/utils/s2n_random.c
s2n_get_public_random_data()
 s2n_get_private_random_data()

under that slight veneer, each is a instantiation of a DRBG:

  https://github.com/awslabs/s2n/blob/master/crypto/s2n_drbg.c

The DRBG is AES_CTR DRBG from the NIST SP800-90A standard. Each is
initially seeded from /dev/urandom, and then reseeded using RDRAND on each
invocation, to provide prediction resistance.

To provide fork-safety, accesses to the DBRGs are guarded with a memory
value that is zeroed on fork (we use pthreads_atfork, and MAP_INHERIT_ZERO
to provide that guarantee).

Reasoning:

*Why AES_CTR DRBG?*
We focus heavily on testing in s2n, and so we had a very strong preference
for using an algorithm that comes with test vectors and could formally
verify. NIST 800-90 is a good place to start, but
obviously there's the whole DUAL_EC debacle.

In researching the basic foundation algorithms to use, I found these papers
(and more) very helpful:

https://pdfs.semanticscholar.org/136e/9214b3637a84b9accb32f7b6176f047c403a.pdf
https://www.cs.cmu.edu/~kqy/resources/thesis.pdf
https://eprint.iacr.org/2006/379.pdf

Reading those, and mailing list archives me, convinced me of the simplicity
and security the HMAC and AES_CTR DRBGs, each being as secure as the
primitive they use. I also was lucky enough to attend the HACS workshops at
RWC, where we had working groups on RNGs each year. The conversations there
were great and very helpful.

My conclusion was that HMAC has some security edges on AES_CTR (My
colleague Matt Campagna's paper is a good guide on that) ... but these turn
out not to be relevant when using Prediction Resistance. So I went with
AES_CTR, with PR, because it was measurably faster on our hosts (we have
AES-NI and SHA acceleration).

Having test vectors greatly increased my confidence in the correctness of
the implementation, and they were easy to incorporate:

https://github.com/awslabs/s2n/blob/master/tests/unit/s2n_drbg_test.c

the NIST spec also provides a declarative definition that we were able to
translate into SAW and Cryptol easily:

https://github.com/awslabs/s2n/tree/master/tests/saw/spec/DRBG

and so we now actually formally verify that the s2n DBRG code is equivalent
to what's in the spec. Further increasing my confidence that we at least
get that right.  I'm not aware of any /dev/urandom has a similar level of
proof or confidence.

With the 800-90A AES_CTR, we mix in RDRAND on every invocation, making it
unpredictable. More importantly though, this also greatly increases the
amount of work that certain kinds of hardware backdoor (like a compromised
CPU) would have to perform to produce a deterministic output. Such a
backdoor would need to inspect the present state of the DRBG, run through
AES, in order to predict the output that would be produced from a known
RDRAND output.  That's a neat property, though obviously not a general
defense against hardware backdoors (like just reading the output registers
or memory).

*Why two RNGs per-thread?*
The worst kind of failure for an RNG is to expose secrets. Algorithmic
weaknesses and implementation errors can lead to predictable or duplicate
RNGs outputs. It's a total disaster if RNG output used in an IV (which is
often public) compromises the RNG output used for a shared-secret.  So they
are kept very separate, as a defense in depth.

The per-thread part is a performance consideration, to avoid locking.

*Anything else useful?*
I like BoringSSLs approach too, which is based on ChaCha20 and RDRAND (or
was when I looked), but didn't adopt it because there's no good spec to
work a proof from, and performance was comparable for our use-case.

In looking at other userspace RNGs, I did see that some use getpid() and
getppid() as fork-safety guards. These provide some more protection, but do
cost a little, and it turns out they aren't effective on Linux (libc can
lie about getpid()) - so we don't include those. I wish Linux had
MAP_INHERIT_ZERO
, to avoid pulling in pthreads.

We do call /dev/urandom to seed initially, and found that EINTR can happen
more than we expected. We ended up adding exponential backoffs (see
s2n_random.c) to avoid tight spinning loops in some cases. Never would have
predicted that happening.

-- 
Colm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20170628/6d3693fd/attachment.html>