[Cryptography] Creating a Parallelizeable Cryptographic Hash Function
leichter at lrw.com
Wed Oct 8 10:33:17 EDT 2014
On Oct 7, 2014, at 5:34 PM, Bill Cox <waywardgeek at gmail.com> wrote:
> > Digest = H(1 || B1) * H(2 || B2) * ... * H(n | Bn) mod p
> This falls immediately to a prefix attack...
> Hi, Jerr. Thanks for taking a look at this. However, I am confused. Why do I care that an attacker can in constant time compute the correct hash of a different set of data?
Urk, I kind of applied the wrong tool. In fact, the iterative structure of the commonly used hash functions all have this prefix property - which is what makes the obvious techniques of pre- or post-fixing a key to make a keyed MAC from a hash fail.
However, this has always been seen as a weakness. You'd like a hash function to "look like a random function". Anything that makes it look less like a random function leaves potential traps for the unwary - many, many people have made the mistake of using H(key || X) as a keyed MAC. It's good to have robust primitives that are hard to break. Every algebraic property of a hash is also a potential trap for the unwary. In fact, the newer generation of hash functions, I believe, do *not* have this prefix property - nor do they have any other algebraic properties anyone is aware of.
In the case you propose, there's another issue: Digest(X) == 0 if and only if one of the constituent hashes is 0. That breaks one of the basic requirements for a secure hash function: Given H(X), it's difficult to find a Y != X such that H(Y) == H(X). When H(X) == 0, it's trivial to find as many examples as you like.
> By multiplying in H(n+1 || Bn+1), the data is different, and so is the hash. That seems to be the way we want this function to work. In constant time any new block can be added or subtracted from the hash. That is the goal, I believe.
I'm not sure what the goal is. This is not a property of a random function, so is not a property expected of a secure hash function. You're proposing a new primitive, with its own set of security properties (which you haven't fully written down), and which may or may not be useful.
> It seems to me that the security model is that if we have D = H(B1, B2, ... , Bn), then an attacker should not be able to find find D = H(C1, C2, ... , Cm), unless n == m and Bi == Ci for all i in 1 .. n.
See above; this is false when D == 0.
> By appending H(n+1 || Bn+1), you've changed the message, and derived the correct hash of it, in constant time. Isn't that useful?
I have no idea. You would need to propose a use, define the security properties needed for that use, then show (under appropriate assumptions) that you've attained them.
Note that the cost of computing your digest is at least double that of simply doing a hash over the original data (as you run the hash on twice as much data - not to mention the cost of all those modular multiplications). You'd need to justify that cost.
> You can also prepend, or even replaced any Bi you want in constant time.
> I agree that this should certainly not be used for anything for now, but I would like to talk about it more. Do you agree with my assertion that it is secure for the purpose of verifying data integrity of a large data set, while allowing constant time update, including appending new data blocks or replacing old ones?
I have no clue.
I also see no obvious advantage to your scheme over simply adding the constituent hashes together (effectively mod 2^n) - a much, much cheaper operation which doesn't have problems with 0. (You can make it even cheaper by considering each H() value as a vector of 32- or 64-bit values and adding them as vectors.) Can you describe an attack on the cheaper approach that fails for yours?
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4813 bytes
Desc: not available
More information about the cryptography