Certainty

Thu Aug 20 04:26:29 EDT 2009

> Getting back towards topic, the hash function employed by Git is showing 
> signs of bitrot, which, given people's desire to introduce malware 
> backdoors and legal backdoors into Linux, could well become a problem in 
> the very near future.
>
> "James A. Donald" <jamesd at echeque.com>

> I believe attacks on Git's use of SHA-1 would require second pre-image
> attacks, and I don't think anyone has demonstrated such a thing for
> SHA-1 at this point. None the less, I agree that it would be better if
> Git eventually used better hash functions. Attacks only get better with
> time, and SHA-1 is certainly creaking.
> 
> Emphasis on "eventually", however. This is a "as soon as convenient, not
> as soon as possible" sort of situation -- more like within a year than
> within a week.
> 
> Yet another reason why you always should make the crypto algorithms you
> use pluggable in any system -- you *will* have to replace them some day.
> --
> Perry E. Metzger		perry at piermont.com

> Of course, I still believe in hash algorithm agility: regardless of how preimage attacks will be found, we need to be able to deal with them immediately.
> 
> --Paul Hoffman, Director

I tried telling this to Linus within a few weeks of the design, while
he was still writing git.  He rejected the advice.  Perhaps a
delegation of cryptographers should approach him -- before it's too
late.

His biggest argument was that the important git trees would be "off-net"
and would not depend on public trees.  I think git is getting enough
use (e.g. by thousands of development projects other than the Linux
kernel) that those assumptions are probably no longer valid.

His secondary argument was that git only uses the hash as a
collision-free oracle, not a cryptographic hash.  But that's exactly
the problem.  If malicious people can make his oracle produce
collisions, other parts of the git code will make false assumptions
that can be exploited.

His final argument is the same one I heard NSA make to Diffie and
Hellman about DES in 1976: "the crypto will never be the weakest link
in the system, so it doesn't really have to be that strong".  That
argument was wrong then and it's wrong now.  The cost of using a
strong cryptosystem isn't significantly greater than the cost of using
a weak cryptosystem; and cracking the crypto HAS become the weakest
link in the overall security of many systems (CSS is an obvious one).
See:

  http://www.toad.com/des-stanford-meeting.html

	John

To: torvalds at osdl.org, gnu at toad.com
Subject: SHA1 is broken; be sure to parameterize your hash function
Date: Sat, 23 Apr 2005 15:21:07 -0700
From: John Gilmore <gnu at new.toad.com>

It's interesting watching git evolve.  I have one comment, which is
that the code and the contributors are throwing around the term "SHA1
hash" a lot.  They shouldn't.  SHA1 has been broken; it's possible to
generate two different blobs that hash to the same SHA1 hash.  (MD5
has totally failed; there's a one-machine one-day crack.  SHA1 is
still *hard* to crack.)  But as Jon Callas and Bruce Schneier said:
"Attacks always get better; they never get worse.  It's time to walk,
but not run, to the fire exits.  You don't see smoke, but the fire
alarms have gone off.  It's time for us all to migrate away from
SHA-1."  See the summary with bibliography at:

  http://www.schneier.com/crypto-gram-0503.html

Since we don't have a reliable long-term hash function today, you'll
have to change hash functions a few years out.  Some foresight now
will save much later pain in keeping big trees like the kernel secure.
Either that, or you'll want to re-examine git's security assumptions
now: what are the implications if multiple different blobs can be
intentionally generated that have the same hash?  My initial guess is
that changing hash functions will be easier than making git work in
the presence of unreliable hashing.

In the git sources, you'll need to install a better hash function when
one is invented.  For now, just make sure the code and the
repositories are modular -- they don't care what hash function is in
use.  Whether that means making a single git repository able to use
several hash functions, or merely making it possible to have one
repository that uses SHA1 and another that uses some future
WonderHash, is a system design decision for you and the git
contributors to make.  The simplest case -- copying a repository with
one hash function into a new repository using a different hash
function -- will change not only all the hashes, but also the contents
of objects that use hash values to point to other objects.  If any of
those objects are signed (e.g. by PGP keys) then those signatures will
not be valid in the new copy.

Adding support now for SHA256 as well as SHA1 would make it likely
that at least git has no wired-in dependencies on the *names* or
*lengths* of hashes, and let you explore the system level issues.  (I
wouldn't build in the assumption that each different hash function
produces a different length output, either, though these two happen
to.)

Enjoy,

	John Gilmore

Date: Mon, 25 Apr 2005 13:38:40 -0700 (PDT)
From: Linus Torvalds <torvalds at osdl.org>
To: Seth David Schoen <schoen at eff.org>
cc: John Gilmore <gnu at toad.com>, Kees Cook <kees at osdl.org>
Subject: Re: John Gilmore on SHA-1 [gnu at toad.com: Pls forward to Linus: SHA1
 is broken]

...
As to your SHA1 concerns:

> It's interesting watching git evolve.  I have one comment, which is
> that the code and the contributors are throwing around the term "SHA1
> hash" a lot.  They shouldn't.  SHA1 has been broken; it's possible to
> generate two different blobs that hash to the same SHA1 hash.

Actually, even the theoretical breaking has not been proven for a 
pre-existing SHA1 hash (ie you need to control both the starting point for 
it), and more importantly, git really uses the SHA1 has a _hash_, not 
necessarily as a cryptographically secure one.

IOW, security doesn't actually depend on the hash being cryptographic, and
all git really wants is to avoid collisions, ie it wants it to hash the
contents well. That, sha1 definitely does, and even an md5sum would
suffice (but having 160 bits instead of "just" 128 obviously adds to the
space, so that's always a bonus).

Of course, the fact that sha1 is also very expensive to try to fool is a 
big bonus, since it means that it's just another layer on the real 
security model. But the _real_ security comes from the fact that git is 
distributed, which means that a developer should never actually use a 
public tree for his development.

For example, I've got two separate firewall layers (and a NAT) in between 
me and the internet, and my personal tree is on that machine. I never 
actually trust or use the external trees - I just push the result to them. 

This is something you cannot do with a centralized SCM server like SVN or
other traditional crud. A centralized one obviously has to be accessible
to all the developers, which means that it's forced to be open enough to
be much more easily attackable, and also means that there is a single 
point of failure also from a security standpoint. 

In contrast, even if somebody were to compromise my machine, that does 
_not_ automatically compromise the trees of other developers. They'd still 
have all the pristine objects, and never even fetch an object from me that 
has the same name (ie sha1 hash) as one they already have.

In other words, to really break a git archive, you need to

 - be able to replace an existing SHA1 hash'ed object with one that hashes
   to the same thing (_not_ the breakage that has been  shown to be 
   possible already)
 - the replacement has to still honor all the other git consistency checks 
   (even "blob" objects have them: they need to have a valid header with a
   valid length, so it's not sufficient to just find another object that 
   hashes to the right thing, you have to find an object with a valid 
   header that hashes to the right thing)
 - you have to break in to _all_ archives that already have that object 
   and replace it quietly enough that nobody notices.

Quite frankly, it's not worth worrying about. It's a hell of a lot easier
to just break a source archive with other means (ie pay a developer ten
million dollars to just insert the back door you want inserted).

		Linus

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com