[Cryptography] Whitening Algorithm

Mon Jul 27 01:46:49 EDT 2015

> 2b) One solution is to encrypt or hash the output
>
> This will result in a decrease in throughput and add complexity to the
> project. Options include AES-CBC-MAC, Cha-cha, Blake, etc.
>

These are different things. AES-CBC-MAC (when it isn't being used as a
MAC) is an entropy extractor. This was proved in this paper:
https://www.cs.nyu.edu/~dodis/ps/hmac.pdf .

Cha Cha Cha is a CSPRNG. Not an extractor.

Blake is a hash function. A hash function can be used for entropy
extraction, but its properties as an extractor (as far as I know) haven't
been proven in the same way a MAC has.

Encryption is not extraction. An extractor always gives less out than went
in.

> 2c) Another option is to seed a CSPRNG, generate a pseudo random stream,
> and if I like, mix it with the raw stream from the diode. This would keep
> the throughput up, but add complexity.
>

Rather than 'option' think all-of-the-above. The standard chain of events
is..

1) Sense bits of partially random data from the environment.
2) Pass them through an entropy extractor to get 'full entropy' random
data (for your preferred definition of full entropy.
3) Use this data as seeds to a CSPRNG, for performance reasons.

In terms of what you get at each stage..

1) Raw data:  Bad quality - Typically Slow (but metastable sources run at
gigabits/s).
2) Extractor Output: Best quality - even slower.
3) Output of CSPRNG seeded by extractor - Good enough for crypto - much
faster.

The third step is optional, and possibly a bad idea if you expect it's
output to be used to generate seeds for a downstream CSPRNG.

> 2d) Avalanche noise from transistors is subject to failure.
>

Entropy source design is interesting in that most the mechanisms that
people think make good entropy sources are in fact completely useless.
Avalanche noise is an example. You want an entropy source to:
1) Be unconditionally stable over the lifetime of the device
2) Be close enough to statistically stationary that it doesn't matter much
3) Be able to be mathematically modeled in a way that worst case
min-entropy bounds can be derived.
4) Be manufacturable (presuming you're making more than one of them.
5) Not need user intervention to make it work, since that yields an attack
vector.
6) Not have pathological modes (like ring oscillators do).

In my corner of the universe, we like the noise driven resolution of
metastable circuits. Metastability always resolves and with proper design,
always happens.

If you can identify the failure modes of your circuit with another
circuit, then you are in business, see below..

> Correct me if Iâm wrong, but it looks like without a health check, both
> approaches (2b and 2c) will produce random output even if the transistor
> fails and outputs all 0s. One thing I liked about my (flawed) algorithm is
> that it would pass statistical tests if the noise source was healthy, and
> produce poor, detectable results if the noise source failed.
>

Online health testing is the means to salvation. You cannot look at data,
test it and know whether or not it is random. In random data, all
sequences are equally likely. So the answer of such a test is always 'yes,
it looks like random data could look'.

However you should know what the output of a physical source should look
like since it will not be perfectly random. It will have some property,
like a gaussian distribution, or serial correlation, or suppression of
certain sequences relative to others. When it is not working in some way
you can expect it to do something else that is clearly distinguishable. So
online testing is useful in that it can be used to identify that data is
statistically likely to be due to a failure. If you're doing a circuit,
SPOF and DPOF analysis is useful to find out what the non trivial modes
might be.

So you can understand your circuit and have mathematical and simulation
models that tell you that if you construct the circuit right, it will be
entropic. You can have online tests that tell you whether or not you
constructed it right.

Your test, whatever it is, is going to be spitting the universe of bit
strings from your device into 'good' strings and 'bad' strings. Assuming
the output to be random, the bad strings are not in fact bad, just more
likely in a broken instantiation.

So you should create your tests to be pessimistic, with a high false
positive rate, tagging data that looks bad as being bad - to suppress the
false-negative rate. But use a long term count of the bad/good ratio to
tell you if the source is broken and use the bad/good tagging to tell you
whether or not to trust the current blob of data that has been tested. By
'trust' I mean count its entropy in the input to the extractor. If it's
bad, still throw it in, but have the extractor not complete a seed until
it has had enough good samples in. This way you don't throw away the
entropy or diminish the entropy into the extractor, you modulate your
level of extraction conservatism based on the short term data and use the
long term data to decide if it's broke. This yields the right behavior in
a system - no numbers out when it's broke, numbers out when it's working,
an instantaneous response to an instantaneous failure and no user
detectable response to transient false positives when it is working.

This is the fashion in which the design of entropy source, entropy
extractor and online health test all go together to make a working system.
They are not independent things. They need to do the right thing when
bolted together. Other people may have better ideas for composing the
parts of an RNG, but that's how I do it.

> With the above architectures, it seems some kind of internal health check
> would be needed.

Yes, see above.

>Running the stream through a Von Neumann filter might
> achieve this.

Nooooo!! A Von Neumann unbiaser is not fit for use as an entropy extractor.
Algorithmically cheap extractors require multiple independent sources :
http://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/BIW04/BIW.pdf

Von Neumann Extractors. Just kill them already. Yuval Peres whiteners too.

Single input extractors require cryptographic functions (hashes, MACs
etc.) which tend to require more computation resources.

Would a simple monobit test be a good (read: cheap) litmus
> test for the noise sourceâs health, or is something more sophisticated
> needed?

No. There's no reason to think a monobit test can detect all the failure
modes of the source. I've been arguing for model-equivalence checking as
being the sane way to do online testing for a long time. This recent paper
is saying something similar but goes on to describe offline tests, rather
than online: http://www.nist.gov/customcf/get_pdf.cfm?pub_id=917957 and
based on discussions at the NIST lightweight crypto workshop last week, I
fully expect this to become the basis for the entropy estimation
algorithms in SP800-90B, replacing or adding to the non-IID set that is in
the current draft.

Something less sophisticated is needed for online testing. I'm giving a
talk that will cover this at ICMC 2015. In the RNG circuits I design, we
count the frequency of patterns in a sliding window and compare to
expected statistical bounds. This is unreasonably effective, and cheap in
hardware. E.G.
https://www.cryptography.com/public/pdf/Intel_TRNG_Report_20120312.pdf.

>Bill mentioned doing health checks on a host machine. Would this
> involve switching modes on the device? That is, stream unwhitened noise
> periodically to the host to check for health, and then switch back to
> encrypted output?

That's not how I would do it. Test and tag the data going into the
extractor so you know whether or not to count its entropy.

>
> 2e)
> Iâve diagrammed two approaches <http://i.imgur.com/1HdwjYr.png> here to
> help me understand.
>

Your diagram on the right is closer to correct. However what are you doing
with the result of the test? If you are dropping data on the floor because
it failed a test, then you have a data changing process and it will reduce
the entropy. This is bad. The behavior of the extractor should change
based on the result of the test in the way I described above, by mixing in
more data into the next seed when the data isn't good.

> 3) Next Steps
>
> Rhys Weatherly has implemented a lot of these cryptographic algorithms for
> Arduino <http://rweather.github.io/arduinolibs/crypto.html> and measured
> their performance. Iâm going to try to implement one of the above
> architectures. Hopefully I can find something that the Arduino can handle
> that can be abstracted to such a degree that the code is easy to read.
>

Good luck! Easy to read is good. I see it that library has SHA256 in HMAC
mode. That would make a useable extractor if coupled to a quantified
source and a fit-for-use online health test.

DJ