[Cryptography] Bloom filter question
Jerry Leichter
leichter at lrw.com
Fri Aug 8 06:45:16 EDT 2025
>>> I suppose if it were an issue, we could add dummy items that don't
>>> have @ signs so they can't collide with real addresses.
>>
>> That's not how a Bloom filter works...
>
> Sure it is. If I have two addresses foo at example.com and bar at example.net,
> I hash them, and use the hashes to set bits in the bit array. Then I
> make up 8 random strings not containing @ signs, hash each of them,
> and set those bits. Now you can guess there were 10 strings, but you
> can't tell how many were real addresses.
There’s some lack of clarity here about the referent in the comment.
It is certainly true that a string with no “@“ in it could produce the same hash value - hence set the same bits - as a validly formed email address. Well, that’s true unless you rig the hash value - e.g., reserve one bit in it to designate whether an “@“ appeared in the input. But unless you do something silly like that, the hash is designed to look like a random function so distinguishing between “@“-containing and noncontaining strings would be impossible.
But it really makes no difference. In fact, if your concern is about someone who gets hold of the Bloom filter’s bit map having an advantage in determining how many values have been inserted in it, the obvious protection measure is to simply pre-set x% of the bits, chosen at random, to 1. x would be chosen as a tradeoff between false positives introduced and noise in an attacker’s estimates, given estimates on the range of expected numbers of legitimate entries.
Frankly, it’s hard for me to see an application where linking the number of entries in the filter was a significant issue but I’m sure it’s possible to construct one.
BTW, this all overlaps a long-expired patent David Wittenberg and I got while at DEC to use a Bloom filter to reject duplicated passwords. We used cryptographically-secure hash functions to avoid an attacker reverse-engineering the passwords from the Bloom filter’s bit map.
-- Jerry
More information about the cryptography
mailing list