[Cryptography] Bloom filter question
Christian Huitema
huitema at huitema.net
Fri Aug 8 22:57:01 EDT 2025
On 8/8/2025 3:45 AM, Jerry Leichter wrote:
> But it really makes no difference. In fact, if your concern is about someone who gets hold of the Bloom filter’s bit map having an advantage in determining how many values have been inserted in it, the obvious protection measure is to simply pre-set x% of the bits, chosen at random, to 1. x would be chosen as a tradeoff between false positives introduced and noise in an attacker’s estimates, given estimates on the range of expected numbers of legitimate entries.
Take a Bloom filter as a piece of code that answers the question "is X
in the filter" as "Maybe" or "No". It is designed to answer "maybe",
because it is a statistical process: maybe means that the bits are set
for each N hashes of X. On the other hand, No is unambiguous. Of course,
the application can only tolerate some amount of ambiguity, or false
positive if you prefer. So it will dimension the filter to achieve that
amount in normal usage. "Maybe" means yes in 99% of the cases, or 99.9%,
or 90%, or maybe 50%, But a Bloom filter with 99% false positive would
not be terribly useful.
If the content of the bloom filter leaks, the adversary will be able to
run "is X in the filter" for a number of interesting values of X. Yes,
there will be some false positive, but the adversary is still going to
get valuable insights.
Setting some of the bits at random is not going to help all that much.
It will increase the rate of false positive for the adversary, but it
will also increase them for the application. See previous argument: the
application has to work, so it will dimension the filter to have an
acceptable rate of false positives.
If I was really worried about the bit map leaking information, I would
add encryption. Maybe use a keyed hash, so that the question becomes,
"given Key K, is X in the filter"? The adversary would need to capture
both the bitmap of the filter and the value of the key before gaining
information, which is probably harder than just capturing the filter.
Probably, but careless implementations could reveal both.
-- Christian Huitema
More information about the cryptography
mailing list