[Cryptography] Bloom filter question

Christian Huitema huitema at huitema.net
Fri Aug 8 22:57:01 EDT 2025


On 8/8/2025 3:45 AM, Jerry Leichter wrote:

> But it really makes no difference. In fact, if your concern is about someone who gets hold of the Bloom filter’s bit map having an advantage in determining how many values have been inserted in it, the obvious protection measure is to simply pre-set x% of the bits, chosen at random, to 1.  x would be chosen as a tradeoff between false positives introduced and noise in an attacker’s estimates, given estimates on the range of expected numbers of legitimate entries.

Take a Bloom filter as a piece of code that answers the question "is X 
in the filter" as "Maybe" or "No". It is designed to answer "maybe", 
because it is a statistical process: maybe means that the bits are set 
for each N hashes of X. On the other hand, No is unambiguous. Of course, 
the application can only tolerate some amount of ambiguity, or false 
positive if you prefer. So it will dimension the filter to achieve that 
amount in normal usage. "Maybe" means yes in 99% of the cases, or 99.9%, 
or 90%, or maybe 50%, But a Bloom filter with 99% false positive would 
not be terribly useful.

If the content of the bloom filter leaks, the adversary will be able to 
run "is X in the filter" for a number of interesting values of X. Yes, 
there will be some false positive, but the adversary is still going to 
get valuable insights.

Setting some of the bits at random is not going to help all that much. 
It will increase the rate of false positive for the adversary, but it 
will also increase them for the application. See previous argument: the 
application has to work, so it will dimension the filter to have an 
acceptable rate of false positives.

If I was really worried about the bit map leaking information, I would 
add encryption. Maybe use a keyed hash, so that the question becomes, 
"given Key K, is X in the filter"? The adversary would need to capture 
both the bitmap of the filter and the value of the key before gaining 
information, which is probably harder than just capturing the filter. 
Probably, but careless implementations could reveal both.

-- Christian Huitema



More information about the cryptography mailing list