de-identification

Matt Crawford crawdad at fnal.gov
Thu Jun 9 09:21:50 EDT 2005


On Jun 8, 2005, at 15:19, dan at geer.org wrote:
> I'd like to come up to speed on the state of the
> art in de-identification (~=anonymization) of data
> especially monitoring data (firewall/hids logs, say).

I don't know the state of the art, but I can tell you the state of the 
artless.  I had a request to share ourr border router traffic logs 
(Cisco netflow) with a university, so they could try out some anomaly 
detection schemes they were working on.

(Bkgnd: We don't consider our network topology sensitive. Our traffic 
logs are subject to a general respect for privacy.)

Since they could send us packets of their choosing, I deemed it useless 
to obfuscate our own IP addresses.  I chose to anonymize all the 
external addresses.  My design note is below.

But then, as fate would have it, the university said they needed the 
true external addresses.  That left me a bit stumped.  Perhaps a less 
chaotic mapping, like one that is bijective between classful network 
numbers, would do.
============================

obfuscation filter program

   Parameters
     Blocks of IP addresses deemed internal.  Internal includes multicast
     addresses and RFC 1918 "private use" address.

   Working data preserved across runs
     For each date, a database of (true address, substituted address) 
pairings.

   Algorithms
     Substituted addresses are pseudo-random, formed by MD5-hashing a
     string (S | D | A | N) and taking the first 32 bits.
       S = fixed secret hash seed, long term
       D = date of data, in YYYYMMDD format
       N = integer, starting at 0 and incremented if resulting address
           is an internal one or a collision.

     to obfuscate an IP address: {
       if it's internal, return it unchanged.  otherwise
        is a substitute is already assigned?  If so, return it. otherwise
         for ( done = N = 0; !done; N++ ) {
           generate substitute address by hashing as above
           if ( !collision ) done = 1
         }
         save forward & reverse mappings
     }

     for each netflow record {
       i = 0
       if ( src is external ) {
         obfuscate src; i++
       }
       if ( dst is external ) {
         obfuscate dst; i++
       }
       if ( i != 1 ) log an unusual condition
       write output
     }

Scripts:

   generator loops over input files, applying obfuscator, writing 
temp-named
   output file, then renaming completed output file to permanent name.

   mover looks for completed output files, copies them to destination, 
then
   looks for more, sleeping and retrying if there are none.

Other notes:

   The obfuscated mappings can be regenerated at will if exactly the 
same data
   is processed in the same sequence, and the secret hash seed is known.


---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list