"Approximate" hashes

Jon Callas jon at callas.org
Thu Sep 2 04:26:45 EDT 2004

> Hence my question: is there some "approximate" hash function (which I 
> could
> use instead of SHA-1) which can verify that a text hashes "very close" 
> to a
> value? So that if I change, say, tabs into spaces, I won't get exactly 
> the
> same value, but I would get a "good enough"?

What you want is what's called a canonicalization function. You want to 
hash not T, but F(T), and F can de-tabify, and so on.

As has been mentioned, OpenPGP has a canonicalization for text and 
cleartext signatures (and we debate what it should be, even). XML 
Digital Signatures has one.

The major other issue you have to deal with is whether the 
canonicalization is an interpretation, a conversion, or an assurance.

If it's an interpretation, then you are hashing F(T), and there's 
always some slim chance that there's a collision between two texts that 
canonicalize to different values, but hash to the same. I wouldn't 
worry about it, myself. Much. (Assuming a decent canonicalizer, of 

If it's a conversion, then you're replacing the text with the 
canonicalized text. This puts the canonicalization in the face of the 
users, but removes the problems handwaved at above.

If it's an assurance, then if the text is not canonicalized, then it's 
not a valid message if it doesn't meet the requirements. A message with 
a tab, for example, just isn't a valid message. Don't even hash it, 
return an error. Or alert that it was an invalid message.


The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com

More information about the cryptography mailing list