[Cryptography] Bloom filter question
Jerry Leichter
leichter at lrw.com
Mon Aug 4 01:06:52 EDT 2025
> [I] make an MD5 hash of an address, break that up into 16 bytes,
> and use the 7 low bits of each byte as the bit index. Is it reasonable to use
> the 16 bytes of an MD5 hash as 16 different mini-hashes [for a Bloom filter]?
Let’s ask the question another way: Suppose I had 15 of the bytes of the MD5 hash of an unknown value. Would I have any significant advantage in predicting any bit of the 16th? As far as I know, no one has ever demonstrated that - the breaks are pre-image breaks of specialized sorts. And since you are discarding 15 bits you’re making the prediction problem harder anyway. (You’re making the pre-image problem easier of course.)
In any case, the basic properties of a Bloom index shouldn’t depend too strongly on powerful notions of independence of the hashes. I don’t recall ever seeing anything beyond some hand-wavy statements about using “different” hashes.
Absent a deliberate attack - made way harder given the syntactic constraints on email addresses and their shortness - this seems like a valid approach to me. If you can apply this after some normalization of the addresses - e.g., fixing the case after the “@“ - so much the better in making attacks even less plausible.
-- Jerry
More information about the cryptography
mailing list