[SIMSOFT] Protecting Privacy with Translucent Databases

Sat Aug 3 10:19:43 EDT 2002

--- begin forwarded text

Status: RO
From: "Simson L. Garfinkel" <slg at ex.com>
To: <simsoft at simson.net>
Subject: [SIMSOFT] Protecting Privacy with Translucent Databases
Sender: simsoft-admin at nitroba.com
Date: Sat, 3 Aug 2002 08:14:02 -0400

<http://www.oreillynet.com/pub/a/network/2002/08/02/simson.html>http://www.oreillynet.com/pub/a/network/2002/08/02/simson.html

Protecting Privacy with Translucent Databases

by <http://www.oreillynet.com/pub/au/355>Simson Garfinkel, author of
<http://www.oreilly.com/catalog/websec2/>Web Security, Privacy & Commerce,
2nd Edition
08/02/2002

Last week, officials at <http://www.yale.edu/>Yale University complained to
the FBI that admissions officers from
<http://www.princeton.edu/index.shtml>Princeton University had broken into
a Yale Web site and downloaded admission decisions on 11 students who had
applied to both schools. Princeton responded by suspending its associate
dean of admissions and launching an investigation. That's a good start, but
both colleges should go further, and redesign the way that their databases
treat personal information.

As details surrounding the incident have emerged, it's clear that there's a
lot of blame to go around. Both Yale and Princeton compete vigorously for
the nation's top high school students, and in recent years the competition
has become increasingly aggressive. The schools shower the best students
not just with phone calls and letters, but even with tuition discounts. As
part of that competition, this year Yale unveiled a new Web site designed
to let applicants find out if they had been admitted -- no more waiting for
either that thin rejection letter or the thick admissions packet.

Unfortunately, the security on the Yale Web site was atrocious: all anybody
needed to look up a student's record was that student's name, social
security number (SSN), and date of birth. And it just so happened that the
officials at Princeton had this same information for the most
highly-contested applicants. So in April, when the Web site went live,
Princeton's admissions office sprang to action as well, allegedly
downloading admissions decisions from the Yale Web site on at least 18
separate occasions. The most highly sought-after applicant? President
Bush's niece Lauren Bush, according to an article that appeared in The
Washington Post. (Read about it at
<http://www.washingtonpost.com/wp-dyn/articles/A2983-2002Jul25.html>http://www.washingtonpost.com/wp-
dyn/articles/A2983-2002Jul25.html and
<http://www.washingtonpost.com/wp-dyn/articles/A7815-2002Jul26.html>http://www.washingtonpost.com/wp-
dyn/articles/A2983-2002Jul25.html.)

Who's To Blame

Most of the cyber-security professionals I've spoken with have taken a
decidedly "blame-the-victim" approach with this latest story of Web site
hackery. Assuming that the allegations are true, it's terrible that an
administrator at Princeton would engage in such patently illegal
activities. But what's even worse, they say, is that Yale would deploy a
Web application so poorly conceived and implemented.

To be sure, Yale is not alone in deploying systems with poor security for
personal information. Many banks and credit card companies continue to
treat widely-circulated personal information, like SSNs and birthdays, as
if this information is secret, available only to the bank account holder or
credit card applicant. Clearly it is not, as evidenced by the national
epidemic in identity fraud. But financial organizations have been stymied
in their attempts to find a better means for verifying the identity of
account applicants -- people with whom, by definition, the banks have no
current relationship.

Poor Design Principles At Play

Related Reading

<http://www.oreilly.com/catalog/websec2/index.html>
<http://www.oreilly.com/catalog/websec2/index.html>Web Security, Privacy &
Commerce, 2nd Edition
By
<http://www.oreillynet.com/cs/catalog/view/au/355?x-t=book.view>Simson Garfinkel

<http://www.oreilly.com/catalog/websec2/toc.html>Table of Contents
<http://www.oreilly.com/catalog/websec2/inx.html>Index
<http://www.oreilly.com/catalog/websec2/chapter/ch08.html>Sample Chapter
<http://safari.oreilly.com/main.asp?bookname=websec2>Read Online--Safari

Yale could have designed a better system: it could have asked each
applicant to supply a PIN or a password as part of their application. An
even more secure solution would have been for the university to assign a
password to every applicant and send it back to the high school students
with their confirmation cards. Such an approach would have protected the
process against students who would otherwise use the same password for both
Yale and Princeton.

To provide even better security, Yale and Princeton could have used what's
called a translucent database, a term coined by author and cryptographer
Peter Wayner in his new book by the same title.

A translucent database uses cryptographic methods like hash functions and
public key cryptography to mathematically protect information so that it
cannot be wrongly divulged -- not even to a crooked database administrator.
Translucent databases provide for unparalleled protection of sensitive
information, be that information personal, corporate, or academic. Yet,
with one notable exception, translucent databases are practically unknown
and unused in IT today.

The Unix password file is the one translucent database that is in wide use
today. When you log into a Unix computer, you're asked to provide a
username and a password. If you type the correct information, you're logged
in.

Before Unix, most computers had a "password file" that simply listed valid
accounts and their corresponding passwords. But there is a big problem with
this approach: if an attacker gets access to the file, then everybody's
password needs to be changed.

So Robert Morris and Ken Thompson adopted a different approach when they
designed the Unix password system. Instead of storing the actual passwords,
Unix stores passwords that have been processed with a one-way hash
function. Many people call this a one-way encryption function, but it's
really not encryption, because there's no way to "decrypt" the password
once it is hashed. Instead, when you attempt to log into a Unix system, the
computer takes the password you provide, hashes it, and sees if your hashed
password is the same as the hashed password that is stored in the password
file. If they are, you're allowed to log in. (If you have access to a
library, you can read the original article: Morris, R.H., and Thompson, K.,
"UNIX Password Security", Communications of the 204 ACM, Volume 22, Number
11, November 1979, pp. 594-597. Unfortunately, the article is not available
online without a subscription to the ACM's online library.)

Benefits of Using Translucent Databases

In Translucent Databases, Wayner extends this concept of hashing in new and
important ways. For example, what if a police department needs to build a
database of sexual-assault victims that lets them identify trends but hides
personal information? You could use a translucent database where the first
column is the hash of the victim's name, and the second column is a hash of
their full address, and the third column is a hash of their block and
street. You can now group incidents together by grouping entries with
identical block hashes; you can see if the incidents refer to the same
person by checking to see if those hashes are different.

Wayner's approach makes it possible to let victims update their records
without giving anybody else the ability to search by a person's name. You
do this by adding a password to the victim's name -- a password known to
the victim and nobody else.

For example, if you were to use the MD5 hash function, you could key a
victim's report with the value of MD5 ("J. Smith/color4") where "color4" is
Smith's password. If Smith remembers that her password is "color4", then
she will be able to update her database entry in the future -- perhaps to
tell the database administrators that her perpetrator has been caught. If
there is a concern that victims might forget their passwords, the database
can have additional columns that are protected with other passwords, known
to other people. For example, a second column where the password is known
only to the intake officer. By creating multiple keys using different
combinations of data, it's possible to protect a translucent database
against browsing while simultaneously providing for people's natural
tendency to forget critical pieces of information.

Had either Yale or Princeton adopted Wayner's principles, this nasty little
episode might never have happened. We've already seen that Yale could have
used a PIN or password to prevent the Princeton admissions office from
being able to access the Yale Web site. But if Princeton had used a
translucent database for its applications, then the admissions officials
accused of browsing wouldn't have had access to the student's SSNs, either.

Although it's terrible that colleges like Yale and Princeton use a social
security number as a universal identifier, they do so for a reason: there
are occasionally cases where two students who apply have the same name. By
using the SSN as a single identifier, it's possible to match up the
student's application with their letters of recommendation, their SAT
scores, and other information.

But once the match is done, there is no reason for the colleges to retain
the number. Keeping around large databases of student names, birthdays, and
SSNs merely opens these students up to the threat of identity fraud at some
point in the future. It would be far better for the college databases to
store the MD5 hash of the SSN, rather than the SSN itself.

There are a lot of other examples and clever tricks in Wayner's book.
Together, they make this volume good reading for anybody interested in
techniques for making privacy an inherent property of information systems
-- rather than simply relying on policies, procedures, and access controls.
His best example involves the creation of a database system for a community
baby-sitter reservation system. Clearly, there's a lot of damage that
somebody could do with a database of parents who are away from home,
teenage baby sitters, and vulnerable children. But Wayner shows how you can
use a combination of hash functions and digital signatures to store all of
that information in a database, so that it's simply not possible for anyone
other than authorized users to get it out.

You can find out more about translucent databases at
<http://www.wayner.org/books/td/>Wayner's Web site. And if you want to
apply to Yale, you can find out more information at
<http://www.yale.edu/admit/>http://www.yale.edu/admit/.

<http://www.oreillynet.com/pub/au/355>Simson Garfinkel is a developer with
24 years of programming experience, the author or coauthor of 12 books, an
entrepreneur, and a journalist. He is the founder and Chief Technology
Officer of Sandstorm Enterprises, a Boston-based firm that develops
state-of-the-art computer security tools.

Return to the <http://www.oreillynet.com/>O'Reilly Network.

--- end forwarded text

-- 
-----------------
R. A. Hettinga <mailto: rah at ibuc.com>
The Internet Bearer Underwriting Corporation <http://www.ibuc.com/>
44 Farquhar Street, Boston, MA 02131 USA
"... however it may deserve respect for its usefulness and antiquity,
[predicting the end of the world] has not been found agreeable to
experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com