[Cryptography] Construction of cryptographic software.

Tue Dec 2 16:44:05 EST 2014

It occurs to me that the craft of designing and writing cryptographic
software so that it doesn't leak sensitive information is, at best, a
black art, and most of the necessary techniques aren't widely known,
documented, or shared.

I bet there are a thousand tips and tricks that most of us have never
written down because they are grotty details that aren't very
interesting mathematically.  I would love to hear the coding techniques
that others have developed to control side channels and write software
that is both reliable and doesn't leak information.

Here are a few of mine.

First, I write unit tests for everything.  Often even before writing
the things.  It's astonishing how often crypto code can look like it's
doing the right thing while it's actually doing something very subtly
different that you'd probably never notice was different. Like having
the wrong period for your implementation of some LFSG-based thing like
the Mersenne Twister - get it wrong by one bit, and the output still
looks good, but it'll be insecure and have a period way shorter than you
expected, giving your software  an exploitable bug.  Or checking for
certificate revocation, then accepting a certificate that's been revoked
anyway.

Or getting encrypt-decrypt that reproduces the plaintext, but doesn't
produce the same output on test vectors as the standard implementation
you're trying to be compatible with (and if it's not producing the same
output, very likely that's the standard implementation that's more
secure than yours....)  Or finding the key server unavailable and then
falling back to accepting a certificate without displaying the error it
was supposed to display about not knowing whether the certificate is
good.  Anyway, write lots of unit tests.  Not only will they make sure
you know what your routines are doing, they'll also tell you when you
break something.

The debug build calls the unit tests, and they  write a file of test
results. The test results file, rather than the executable, is the usual
makefile target, and the makefile instructions for building test results
end with 'cat results.txt' which appends the test result output (ie, a
blank line followed by listings of any unit test errors) directly to the
compile output, just like any other errors that need fixing.

If I get any choice about it, I try to do all the crypto in a
single-threaded, dedicated executable that does very little else.  It's
much easier to analyze and debug a single-threaded application.

Know your compiler options.  Use the diagnostic and standard-enforcing
ones heavily.  Use those that enable security extensions peculiar to
that particular compiler if it helps (such as stack canaries or an
option to zero all memory allocated to your program on exit)  but do
your best to write code that does not rely for its security solely on
those extensions, because sooner or later someone will need to compile
it on a different compiler.  Eschew any standard-breaking extensions
that won't work (especially if they will also cause errors) in different
environments.

I write in C using the standard libraries and the GMP bignum library.
GMP has some not-very-widely known primitives that are specifically for
cryptographic use and leave nothing on the stack  or in buffers.
They're a little slower than the "normal" set of calls, but that's okay.
The C standard libraries can mostly be trusted to do exactly what they
say and nothing more.  This is kind of a shame because other languages
have nice facilities, less "undefined" behavior, and considerably more
protection from programmer mistakes, but there is no
way to get around it because AFAIK no other language allows me to
absolutely control when and whether copies are made, when and whether
writes to variables actually happen, etc, as well. Which isn't high
praise for C, because it's still hard and grotty and error prone.  But
at least it's possible.

The runtimes, templates, and libraries that come with most languages
(and here I include C++) can be trusted to do what they say, but without
going over a million lines of difficult, heavily #ifdef'd template code
with a fine toothed comb I can't trust that they do nothing more.
Therefore I don't know how to write secure software in those languages
and be certain that it won't leak information. They accomplish their
semantic goals well, but do so while leaving copies, and fragments of
copies, of everything they touch in their objects, in unused areas of
their allocated buffers until those buffers are actually used for
something else, and on the stack.

A lot of crypto software makes extensive use of global variables for
sensitive values.  They are fast, never get deallocated or
(accidentally) copied during runtime, new values always overwrite
previous values instead of landing at new places in memory possibly
leaving copies of the old values somewhere, and they avoid indirections
that might leave pointers to them lying around.  The only thing even a
little bit subtle is making sure that they get erased as soon as the
program doesn't need them anymore, and again before the program exits,
and that's not hard. A pattern emerges with a dedicated 'eraser' routine
that sets them all to bytes read from /dev/random before program exit.
The read from /dev/random can't be elided by the compiler because it is
a system-level side effect.  But if you read from /dev/random into a
buffer and then copy from the buffer to the sensitive variables, the
compiler can elide that because those are writes to dead variables.
What's necessary is to read bytes from /dev/random *directly* into the
global variables, one at a time if necessary.  It helps if you define a
singleton record type that holds them all; that way you can just
overwrite the record instead of doing one at a time.

That pattern is fine for globals that you clear once or twice per
program run and again on program exit, but I/O, even from /dev/random,
is too slow for using on return from every subroutine that handles
sensitive variables. I kind of don't like using global variables. I
don't like the idea that every part of the program has access to them.
But I have certainly used them.

I use the 'volatile' keyword a lot, to designate local variables so that
writes to those variables will never be skipped, even if writing to them
is the last thing that happens before the procedure they're allocated in
returns. I know of no other language besides C that allows that.
'volatile' allows me to easily use local variables and avoid leaving
anything sensitive on the stack, but do not use it for anything that's
not sensitive.  For example if you use a volatile variable to index a
loop, the loop will run slow because the code has to write that variable
to cache every iteration rather than just mapping it to a register. It's
better to just have a regular auto variable that you use for that.

A very good solution to the problem is to allocate a large 'security'
buffer as a local variable in main(), and have the program manage its
own stack for sensitive variables.  The way that works is that when
main() calls anything, it gives it a pointer into the security buffer,
and the size of the buffer. A called routine checks that the buffer is
large enough and uses as much of the buffer as it needs for its own
sensitive locals by casting the pointer at the buffer into a pointer at
a record type that contains its sensitive locals.  If it calls anything
else that has sensitive local variables, it does so with a pointer just
past the end of its record, and the size it got for the security buffer
minus the size of its own sensitive-locals record.   As with globals,
before program exit you call an 'eraser' that overwrites the entire
buffer with bytes from /dev/random.

The benefit of this is that main() retains control of the buffer.  It
doesn't make the system slower the way 'volatile' does.  And if multiple
routines both read and write in the buffer, the compiler can never elide
writes into it - so the routines can easily and reliably clear any
sensitive vars that they *aren't* passing back to their caller before
they return.  It's probably safer than using 'volatile' local variables,
because it's possible to forget to clear a sensitive 'volatile' before
returning but it is NEVER possible to forget to clear the security
buffer before exiting - that would definitely break your unit tests.
The downside of this is that handling the buffer tends to be a pain in
the @$$, which is why I tend to use 'volatile' instead.  I do use a
designated buffer for VERY sensitive variables, such as passwords.
Absolutely no routine, anywhere, gets to make its own copy of a password
that lives outside the buffer, and main() writes over that as soon as
the key is set up.

I've seen crypto software where sensitive locals were allocated using
the 'static' keyword to ensure that local variables with sensitive
information are not deallocated when the function returns.  This
prevents leaks during runtime, and with static variables, the compiler
can't USUALLY elide writes, so the called subroutine can usually
clear the values of its sensitive locals before returning. But it's  not
a technique I trust, because compilers are ALLOWED to elide final writes
to static variables if it can prove that the initial values of the
static variables don't matter to the routine whose scope they're in, and
static locals are strictly worse than globals for leak prevention
because main() has no way to overwrite all those before it exits. Every
one of them gets released when the program exits, and you just don't
know what's going to be the next program to allocate the block of memory
where they were contained.

I always use unsigned integers.  In fact this is something I learned
rather recently, due to an issue raised on this list.  The basic
mathematical operators (addition, subtraction, multiplication) can
overflow, and overflow on signed integers (with the usual wraparound
semantics that can give a negative result of adding two positive
numbers) is undefined behavior.  If you must use signed integers and
check afterward for overflow/wraparound, you must add them as though
they were unsigned integers, like this:

(unsigned int)z = (unsigned int)x + (unsigned int)y;

because overflow on unsigned integers *is* defined behavior.

You can't even check to see if undefined behavior has happened, because
the compiler will go, "oh, that couldn't happen except for undefined
behavior, and I can do whatever I want with undefined behavior.  I want
to ignore it."

It will then cut the checks and everything that depends on them out of
your program as 'dead code'.  So you can have an integer that is in fact
negative because of a wraparound that occurred while adding two positive
numbers, and the program will jump straight past a check for a negative
value without triggering it.

Crypto coding has taught me to use a few other unusual bits of coding
style. In crypto, we tend to allocate buffers, permutations, etc that
are 'round' numbers like 0x100 bytes or 0x10000 16-bit values.  Because
we're using unsigned integers anyway, indexing into these buffers using
uint8_t or uint16_t variables gives us an automatic range safety check.
 But it is hard to write 'for' loops that exit if we're iterating over
the whole buffer, so instead of 'for' loops I tend to
use do/while loops.  If I want to do something like initializing a
permutation with every value of a uint16_t for example, my usual idiom
is to write something like

count = 0; do {
  permutation[count] = count;
}while (++count != 0);

This is also an example of a habit that suits me when in full-on
paranoia mode to never leave any local variable (such as count) with a
nonzero value if I can help it, whether *I* think that variable is
sensitive or not. If I leave something with a nonzero value on routine
exit, I try to leave the same constant value in it on every exit.

So that's a few bits about writing non-leaking crypto code.  I've been
more concerned with preventing memory-based data leaks, obviously, than
controlling other side channels like timing or power use.

Would anybody else here like to share some of the techniques they use?

Bear

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20141202/5261509d/attachment.sig>