[Cryptography] Hierarchical bulk digest modes

Adam P. Goucher apgoucher at gmx.com
Sat Aug 21 10:40:34 EDT 2021


> > We can now detect a single bit of corruption in the data file. Great! But that doesn't give us any information if the data is corrupted. We don't know where the corruption occurred. That is fine for little files but what about when we are dealing with 1TB archives?
> >
> > A better approach would be to split the data up into chunks and construct a digest (Merkle tree??) over the chunks. But how big should the chunks actually be? There is clearly a tradeoff here which we traditionally just skate over entirely because we have this 'any fault will destroy our crypto' approach.
>
> Likely what you want is what's called a rolling hash (or rolling checksum), like a Rabin hash or Rabin-Karp, or as Jerry mentioned the rolling adler32 hash that rsync uses. Fletcher hashes are likely usable here, too. Adler32 is based on Fletcher hashes. (Years ago, PGP Disk used Fletcher hashes to calculate IVs of logical blocks because in those days tweakable modes like XTS did not exist.)

You might also be interested in the BLAKE3 hash function, which splits
its input into 1024-byte chunks and treats them as the leaves of a binary
tree:

https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3


Best wishes,


Adam P. Goucher


More information about the cryptography mailing list