de-identification

dan at geer.org dan at geer.org
Fri Jun 17 10:48:36 EDT 2005


"Steven M. Bellovin" writes:
 | >
 | >Ladies and Gentlemen,
 | >
 | >I'd like to come up to speed on the state of the
 | >art in de-identification (~=anonymization) of data
 | >especially monitoring data (firewall/hids logs, say).
 | >A little googling suggests that this is an academic
 | >subspeciality as well as a word with many interpretations.
 | >If someone here can point me at the mother lode of 
 | >insight, I would be most grateful.
 | >
 | 
 | What's your threat model?  It's proved to be a very hard problem to 
 | solve, since there are all sorts of other channels -- application data, 
 | timing data (the remote fingerprinting paper mentioned this one), etc.

Steve, et al.,

My threat model is how can I have a convincing
technical solution that, in turn, gets your average
corporate general counsel to permit sharing various
kinds of logs with similar firms.  The Patriot Act
(2001,Bush), PDD 63 (1998,Clinton), and various other
intervening bits of legislation say that threat and
vulnerability information shared between like private
sector firms is (1) exempt from Anti-Trust (even
where security is a competitive feature) and (2)
exempt from FOIA (even where such sharing is under
government aegis).  Nevertheless no corporate general
counsel will permit such sharing.  From where a GC
sits, the risk is clear, near-term and direct to the
firm while any benefit is diffuse and distant, and
no GC believes any laws' words until the courts, as
unacknowledged legislators, get a whack at it and
that being so no GC wants to be the test case.

Ipso facto, I (we) need a way to ensure that log
data can be shared between firms in ways that do
not identify the source firm so that, in turn, 
I can stand up and say that the risk as seen from
the GC's point of view has been technically put
to bed.  I don't imagine for a minute that even
that argument will be trivial, but a technical
solution is necessary even if insufficient.

My real aim is, of course, the characterization 
of macro-scale risk to critical infrastructure.
In the hypothesis-generation stage of such an
effort I need to take field observations that
could easily go any of three ways:

 (1) All the players see the same scans, the same
     automated attacks, the same over-pressure;

 (2) All the players see entirely different scans,
     entirely different automated attacks, entirely
     different over-pressures; or

 (3) One of the players stands apart from the others
     and whereas the corpus of that industrial 
     sector sees the same scans, the same automated
     attacks, the same over-pressure there is one
     player whose experience is different.

This is information that no firm can get on its
own, so uniqueness of value is a given and amongst
rational players unarguable.  What I need is to
break the logjam over being the first to share.

The only alternative is to take the biased samples
that are available inside managed security providers
and confidential consulting firms and pool that data,
thus anonymizing it, within a single corporate shell.
This is second best and tends to have little motive
power of its own, though I/we proved it can be done[1]
as has Qualys[2], inter alia.

Clear enough?

--dan



[1]
http://www.atstake.com/research/reports/acrobat/atstake_app_reloaded.pdf

[2]
http://www.qualys.com/company/newsroom/newsreleases/usa/pr.php/2004-07-28


---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list