[Cryptography] Secure erasure in C.

Sun Sep 11 11:08:11 EDT 2016

As many people have pointed out, the problem with erasure is that it deals with a concept not captured by normal programming languages, operating systems or computing hardware.

Programming languages deal with *returned values* and *variable values*, and so long as these values achieve the desired bit patterns, everyone is cool.  No one cares about the values of (supposedly) "dead" variables which can "never" be referenced.

There is a formal notion from the lambda calculus about all of this: so-called "lazy evaluation".  Lazy evaluation never attempts to evaluate an expression until it is *absolutely necessary for the ultimate output value*, out of fear that this expression will loop or err out.  Lazy evaluation is the ultimate "leave sleeping dogs lie" approach to computation.

A *lazy architecture* would *never* actually clear a variable value unless that variable value participated in a computation leading to a final result.  So the whole concept of "secure erasure" is meaningless in this computing model.

This problem is precisely why "linear types" are a necessary addition to the mathematical modeling of computer systems -- particularly operating systems and hardware.

Operating systems have always attempted to handled "resources" like time (CPU cycles), space (main memory and secondary memory), I/O channels, etc.  But prior to "linear logic", there was never a mathematical model of what a "resource" was.

With "linear types", we finally have a notion of a *resource* which can be temporarily allocated, returned, and re-allocated.

Consider the following trivial Lisp program for computing factorial:

(defun fact (n)
  (if (zerop n) 1
    (* n (fact (1- n)))))

Such a program only talks about *values* and not *resources*.

Let us first rewrite our factorial program to enable it to return *multiple values* instead of the single result value.  So far, this is identically the same fact program as before, except that it allows for multiple values to be returned.

(defun fact (n)
  (if (zerop n) (values 1)
     (multiple-value-bind
        (fact1-n) (fact (1- n))
        (values (* n fact1-n)))))

Now let us rewrite "fact" again to include a *resource*: the free storage list which I'll denote *freelist*.

(defun fact (n *freelist*)
  (if (zerop n) (values 1 *freelist*)
     (multiple-value-bind
        (fact1-n *freelist*) (fact (1- n) *freelist*)
        (values (* n fact1-n) *freelist*))))

What has happened here is that we have made explicit the fact that the freelist is both an argument and a returned value for our "fact" function.

But who is actually using this freelist?

The multiplication function "*" is the function that actually uses storage from the freelist in composing its arbitrary-precision ("bignum") result.

So now we have to rewrite our program once again to provide "*" with the freelist as both an argument and a result:

(defun fact (n *freelist*)
  (if (zerop n) (values 1 *freelist*)
     (multiple-value-bind
        (fact1-n *freelist*) (fact (1- n) *freelist*)
        (multiple-value-bind
           (factn *freelist*) (* n fact1-n *freelist*)
           (factn *freelist*)))))

(Lisp-ers and Scheme-ers will quickly notice that this program can be slightly optimized by removing the final "tail-recursion"; I won't do that here because it doesn't change the meaning of the program.)

The argument (and value) *freelist* is a *linear type*, and a proper language compiler will ensure that its value cannot be either inadvertently copied or inadvertently destroyed; this is the definition of "linear".

This last version of "fact" is also much closer to what is actually happening within the Lisp implementation: the freelist *is* an extra argument and an extra returned value for every Lisp function (at least those Lisp functions that actually need to "CONS").

The fact that the freelist is typically passed around either as a reserved register variable or as a hidden global variable is merely an implementation detail; the code above captures the *meaning* of the computation.

But we aren't done yet.  So far we've talked about *space*, but not *time*.

We now need to add yet another *linear* variable to be added as another parameter and returned value: "*cpu-ticks*".  The meaning of *cpu-ticks* is the *resource* of CPU time allotted to this program.

(defun fact (n *freelist* *cpu-ticks*)
  (if (zerop n) (values 1 *freelist* *cpu-ticks*)
     (multiple-value-bind
        (fact1-n *freelist* *cpu-ticks*) (fact (1- n) *freelist* *cpu-ticks*)
        (multiple-value-bind
           (factn *freelist* *cpu-ticks*) (* n fact1-n *freelist* *cpu-ticks*)
           (factn *freelist* *cpu-ticks*)))))

This "fact" program has now fully fleshed out the *resources* utilized in this computation -- both time and space.  If the *freelist* becomes exhausted, the program will fail.  Likewise, if the *cpu-ticks* become exhausted, the program will fail.

It is *this* level of computation that operating systems deal with: resources.

Modeling at this level of computation will be required in order to start dealing with the resource allocation and de-allocation issues faced by crypto computations.