A risk with using MD5 for software package fingerprinting

Mon Jan 28 05:27:43 EST 2002

> I would like to learn about *code* review practices in whatever
> is considered a 'sophisticated' software company.

When I was working at Cygnus, I was FSF's official maintainer of GDB.
Whenever I cut a GDB release, I would diff it against the previous
release, and read the diff by eye.  I encouraged others at Cygnus, who
maintained the GNU assembler, linker, compilers, etc, to do the same.

While it's hard to bring a fresh eye to a piece of code that you've
spent years working on, it's a lot easier to look over small scattered
differences, remember why each was made, and notice whether they have
any security implications.

This quality-assurance technique frequently caught minor problems,
but mostly I was doing it because the GNU development tools would have
been a great place to slip in a Trojan Horse (recall the Turing Award
paper by Ken Thompson).  Think about how many times you've run GDB
as root, or at all.  And about how many systems have these tools installed.

This is also part of the reason that we did three-stage builds (build
the tools with other tools; use the resulting tools to build
themselves; use the resulting tools to build themselves again).  The
last two versions will be identical, down to the bit -- modulo bugs,
Trojan horses, embedded timestamps, etc.  Our regression testing would
make sure they matched, and we'd investigate every difference.  And if
a Thompson-style binary-only Trojan had been introduced, compiling the
compiler with somebody else's compiler first would defeat that attack
(unless both compilers had been not only compromised, but also taught
to compromise the other compiler).

Cygnus even had the ability to check that cross-compilers built and
running on a variety of different machines (Suns, DECs, PCs, HPs, etc)
would produce exactly the same output files, down to the bit, from the
same input files.  We had all the different machines sitting there for
testing, and we wrote and used the infrastructure to automate that
testing.  Not only did it find lots of obscure little bugs for us,
as well as some floating-point representation problems.  It also 
discouraged platform-specific security breaches.

I'm probably not your average code jockey, but there are at least
some people in positions of trust (e.g. at the top of big distribution 
pipelines) who *do* care enough to read the code, look for holes, and
to work to automate the finding of 'em.

	John

PS: I didn't start my release-comparing practice with GDB.  I don't
know who taught it to me; perhaps Bill Shannon of Sun.  I taught it to
Hugh Daniel, and we used it to build product-quality PostScript-based
window system releases.

PPS: I never did audit the "diff" program though...  :-) As Thompson
so eloquently points out, complex systems involve a lot of reliance on
every piece of code that contributes to the system.  And if after
years of inspections, all your software is clean, how do you know
nobody paid Tim May to slip a security hole into the 8086 circuit
design?  I have done enough years of chip testing AND architectural
validation to know how few of the infinitely many combinations of
instructions or bus cycles are actually tested to make sure that
somebody didn't intentionally make *one* combination do something
interesting.  Even if you trust your processor, didn't the NSA pay the
Taiwanese designer of your RAM chips to replace particular stored code
sequences with other interesting ones, one time out of a hundred, when
fetched?

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com