"Approximate" hashes
Jon Callas
jon at callas.org
Thu Sep 2 04:26:45 EDT 2004
> Hence my question: is there some "approximate" hash function (which I
> could
> use instead of SHA-1) which can verify that a text hashes "very close"
> to a
> value? So that if I change, say, tabs into spaces, I won't get exactly
> the
> same value, but I would get a "good enough"?
>
What you want is what's called a canonicalization function. You want to
hash not T, but F(T), and F can de-tabify, and so on.
As has been mentioned, OpenPGP has a canonicalization for text and
cleartext signatures (and we debate what it should be, even). XML
Digital Signatures has one.
The major other issue you have to deal with is whether the
canonicalization is an interpretation, a conversion, or an assurance.
If it's an interpretation, then you are hashing F(T), and there's
always some slim chance that there's a collision between two texts that
canonicalize to different values, but hash to the same. I wouldn't
worry about it, myself. Much. (Assuming a decent canonicalizer, of
course.)
If it's a conversion, then you're replacing the text with the
canonicalized text. This puts the canonicalization in the face of the
users, but removes the problems handwaved at above.
If it's an assurance, then if the text is not canonicalized, then it's
not a valid message if it doesn't meet the requirements. A message with
a tab, for example, just isn't a valid message. Don't even hash it,
return an error. Or alert that it was an invalid message.
Jon
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com
More information about the cryptography
mailing list