[Cryptography] How programming language design can help us write secure crypto code (was: Re: Other obvious issues being ignored?)

Thu Oct 22 04:27:39 EDT 2015

On 21 Oct 2015 10:56 -0700, from jsd at av8n.com (John Denker):
> On 10/20/2015 08:40 PM, Jerry Leichter wrote:
>> So ... I'd turn this around.  The issue isn't with C, or Java, or
>> any other general purpose language.
> 
> I disagree.
> 
> It's not the only issue, but it *is* an issue when the language, 
> by definition, is not secure and not securable.  One of my favorite
> sayings is,
>   There is no benefit in getting the wrong answer quickly.
> 
> In other words, "optimization" for speed at the expense of security
> is not really an optimization, and a language that permits this is
> a bad language.  If the machine is not secure, you cannot assume
> that the results of /any/ calculation are correct.
> 
>> You make this sound like a criticism of C.
> 
> It's a criticism of C, and also of modern hardware.  A processor
> nowadays is essentially a compiler unto itself;  it takes the
> machine-language opcodes as rather loose hint as to you want
> to do, and compiles that into a sequence of operations that
> actually get performed.  Lots of pipelining, branch prediction,
> conditional execution, et cetera.
> 
> As for the C language, let's consider a specific example, namely 
> logical shift, according to section 5.8 of the language specification:
>   http://sites.cs.queensu.ca/gradresources/stuff/cpp98.pdf#page=111

That's C++, not C. I respectfully point you towards
<https://programmers.stackexchange.com/q/298665/6384>.

> ]] The behavior is undefined if the right operand is negative, or 
> ]] greater than or equal to the length in bits of the promoted left
> ]] operand
> 
> The problem is that "undefined" is waaaaay too vague.  According
> to this specification, 1<<-1 could print an ascii-art portrait of
> Alfred E. Neuman on /dev/console ... and then publish all of your
> private keys on facebook.
> 
> At the very least, this could be improved by saying that the 
> result will be set to an undefined value.  This limits the scope
> of the damage.

In your mind, particularly in practice, what is the difference between
"the behavior is undefined if..." and "the result will be an undefined
value if..."? Do you have an example of any compiler the behavior of
which doesn't fall into the latter category already when faced with a
situation like this?

Yes, _technically_ "the behavior of Operation X is undefined in case
of ABC" means the compiler is free to do absolutely whatever it
pleases in that situation. _In practice_, there are only a few
possible outcomes, largely depending on what exactly Operation X and
the condion ABC is.

And of course, saying that (the operation or the result of) x<<y where
y<0 is undefined allows compilers to change this into x>>|y|, or
allows for hardware architectures that transparently do x>>|y| when
asked to do x<<y where y<0. It also allows for the standard to avoid
the question of the binary storage formats of values, which really
seems out of the scope of the language specification.

Lots of "undefined behavior" comes down to differences between various
hardware architectures, either current or historical. As I recall,
dereferencing a pointer after the memory has been deallocated (using
free(), delete or a similar mechanism) is also undefined behavior. On
some systems it works fine (DOS applications were notorious for doing
this, and it worked because generally nothing else was around to touch
that memory; obviously, that caused all sorts of problems when
multi-tasking environments started becoming commonplace), on others it
gives a segmentation fault (most modern operating systems probably
fall into this cateogry), on yet others (for example if the memory is
cleared early, immediately when returned to the allocator) it might
give you zeroized data. All of these behaviors are fully valid
according to the standard when the standard only states "the behavior
is undefined".

> It would be even better to specify that if the right argument is 
> out of range, the result will be set to zero.  This gets rid of
> the idea of "undefined".  The behavior is the same across all 
> hardware platforms and across all versions of the compiler.

I can think of situations in which a specified result is just as bad
as an undefined result, when the specified result is not what you
would normally expect. _Especially_ in cryptography-related code.

> It would be much, much better to specify that if the right argument
> is out of range, the result will be set to zero and an exception
> will be thrown.

C doesn't have exceptions. Or at least, didn't the last time I looked.
C++ does, but C++ is a different language from C. They are similar in
many ways, but it's also possible to write code in one that doesn't
compile in the other, or worse, compiles cleanly in both but has
different behavior in one compared to the other. Again, even though
they have many similarities, the two are different languages.

> Now it could be argued that range-checking the arguments introduces
> run-time inefficiency.  Two responses:
>  -- This could be handled in hardware at essentially zero cost.
>   The original C compiler reflected the hardware of its day, but
>   the tables have long since turned:  Hardware is designed to do
>   what the language needs.
>  -- A decent optimizing compiler should be able to optimize
>   away the checks in speed-critical situations.  Non-critical
>   situations are not worth worrying about.

The second point brings us right back to the original question: how on
Earth is the compiler supposed to know where it is allowed to optimize
and where it isn't? We'd have to tell it somehow. Since _most_
applications benefit from the optimizations, and even applications
that need security benefit from optimizations in many other places in
the code, the reasonable strategy would be to have some way of telling
the compiler which parts we _don't_ want it to optimize (at all, or
just not optimize away). We don't want to build all of Mozilla without
optimizations just because a small portion of the cryptography-related
code needs specific guarantees about which operations get done.

>> General-purpose languages will always lean toward satisfying the vast
>> majority of programming needs, which will likely provide a semantics
>> that is not consistent with the needs of security-related code.
> 
> I disagree.
> 
> Again I say, there is no advantage in getting the wrong answer quickly.
> If the calculation can't be done securely, there is no point in doing
> the calculation at all.

If, as you propose above as the "even better" solution, an operation
that normally returns a value suddenly returns 0 where 0 is not the
desired value, how does that lead towards getting the right answer
when the language standard not saying what should be done (which is
_not_ the same thing as that a compiler vendor doesn't need to pick
some behavior) doesn't?

>> we really don't have a suitable way to express security-sensitive code. 
> 
> That ought to be a fixable problem.

Like I said in my other post in this thread, add something like
"#pragma nooptimize" to the language. That would tell the compiler
that the such-marked block is off limits to optimizations, ensuring
that what the programmer expresses is what actually gets done.

We can't in source code do much of anything about what the CPU does
with the machine language instructions that building the source code
results in, but we can in source code and the language standard do a
lot about the transformation from source code to machine language.

> I don't have all the answers, but it seems like the list of what
> we need is not very long:
> 
>   *) No UB!  No Undefined Behavior!  None!

Even the C++ standard you linked to mentions "implementation-defined"
behavior in a number of places (see for example § 7.1.5.2 page 109, §
7.2 section 5 page 111, § 7.4 page 123, § 7.5 section 2 page 124, §
8.5.3 page 148, ...; a simple search for "implementation-defined" will
give you plenty of relevant hits).

If you ban undefined behavior from the standard because different
systems might do it differently, then by consequence you also must ban
implementation-defined behavior, because from the point of view of the
language standard, those are very close to the same thing.

Which of course means that the people who _are_ legitimately assuming
that the behavior defined by their particular compiler won't suddenly
change, will likely see _subtle_ breaking changes the next time they
update their compiler, to a version which follows such a revised
standard.

"Undefined behavior" is "we aren't saying what you should do" whereas
"implementation-defined behavior" is "we are saying that this part is
up to you to decide and possibly document". In both cases, the
standard defers the decision of the specific source code construct to
machine language transformation to someone else.

>   *) Constant time execution in some situations.  As mentioned
>    above, this affects the hardware, not just the language 
>    specification.

The language can help a lot here, by allowing the programmer to
specify that a function should not be passed through the optimizer
during the compilation phase. That would leave only optimizations done
in hardware, which seem like a much tougher nut to crack from the
language design perspective.

It isn't terribly difficult (just very different from what most
programmers are used to) to write code that executes in constant time,
as long as the compiler doesn't try to pull the rug out from under
your feet by rewriting the code.

>   *) Rigorous control of copied data, including zeroization
>    when needed.

Assuming that you can tell the compiler which data should be zeroised.
Just because data is copied somewhere and left there doesn't mean it
should be wiped. Case in point: writes to video memory.

>   *) Minimization of TEMPEST and other side effects.

Forgive my ignorance, but how on Earth can programming language design
help against attacks like TEMPEST?

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
OpenPGP B501AC6429EF4514 https://michael.kjorling.se/public-keys/pgp
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)