[Cryptography] Removal of spaces in NIST Draft SP-800-63B

Mon Apr 10 18:00:51 EDT 2017

> On Apr 9, 2017, at 1:45 AM, Kevin W. Wall <kevin.w.wall at gmail.com> wrote:
> 
> Arnold, apologies for the lapse in time in my responding to your last reply.
> 
> On Tue, Apr 4, 2017 at 12:42 PM, Arnold Reinhold <agr at me.com> wrote:
>> 
>>> On Apr 3, 2017, at 9:00 PM, Kevin W. Wall <kevin.w.wall at gmail.com> wrote:
>>> 
>>> On Mon, Apr 3, 2017 at 11:10 AM, Arnold Reinhold <agr at me.com> wrote:
>>>> …
> <big snip>
>> 
>>>> ...
>>> 
>>> True in general, but where I think it helps is in terms
>>> of customized (i.e., user created) questions.  I always
>>> tell people that if you have the opportunity to create
>>> a custom question, use that option instead and then
>>> pick a topic that only you know about. I personally
>>> recommend something that an individual might find quite
>>> embarrassing, because those are details that they have
>>> generally NOT widely shared. An example might be (note,
>>> I am making this completely up):
>>> Q: What did your father do to you when he found you
>>>    with his Playboy magazine?
>>> A: He made me run up and down our driveway, naked.
>>> 
>>> When a company allows user-created questions, then I
>>> recommend that they encrypt the questions and hash the
>>> answers. The reason is that way, it makes it harder for
>>> insiders to read the questions and much, much more
>>> difficult for them to discover the answers. (Also, requiring
>>> user created questions shifts some of the liability back to
>>> the user. They can no longer complain that you only had
>>> lame questions to choose from that only had a small set
>>> of possible answers.)
>>> 
>>> But for the ordinary lame q's: "What's your favorite sports
>>> team?" or "What's your favorite flavor of ice cream?",
>>> etc., you are spot-on. Hashing does no good. (Of
>>> course, that's the type of questions that the OWASP
>>> "Choosing and Using Security Questions Cheat Sheet" is
>>> meant to prevent in the first place.)
>> 
>> Hashing does no good for simple answers and it isn’t suitable for complex answers.
>> 
>>   Initial answer "He made me run up and down our driveway, naked.”
>> 
>>   Challenge answer: "He made me run naked up and down our driveway.”
>> 
>> Hashes would not match. Encryption would allow easy human intervention and I suspect current language understanding software could match up the two answers. Even a simple algorithm such as sorting the words alphabetically and calculating a correlation might work well enough in many cases. The goal would be to avoid human intervention in most reset requests. You can’t rely on humans remember the exact way they answered a complex question.  “I had to exercise in front of the house with no clothes on” might still take a human to verify.
> 
> In reality, the only people that are that conscientious are those who
> write their security Q&A into a password manager. (And those who do
> that are also the least likely to have to use the "Forgot Password"
> flow in the first place.) So in practice, this seems to work, even
> though it doesn't in theory.
> 
>> 
>> In every case, it seems to me, hashing is NEVER right for security Q/A. Maybe you could update your cheat sheet?
> 
> Nope; not going to update the cheat sheet and here's why. You're
> thinking about this from purely a cryptographic or UX perspective.
> This approach evolved after extensive discussions with those in our
> legal department in my old company and was approved by them.
> 
> Soon after legal had corporate security update the company security
> policy related to the "forgot password" processing that suggested
> customers be offered the ability to create at least one custom
> security question, some of development teams started doing this by
> storing both security questions and answers in plaintext. We
> (corporate security) were made aware and investigated.
> 
> A while later, an observation was made that some users were defining
> questions like
> 
>    What is my social security number?
> or
>    What is my bank account number?
> 
> whose answers involved confidential customer data, but to our
> customers probably seemed like a pretty secure answer and ones they
> would remember.
> 
> At first, corporate security just suggested encrypting both Qs and As,
> but then we soon afterwards realized that someone with access to the
> DB and the encryption key could decrypt said answers.
> 
> So we made our legal department was made aware of this situation. (At
> the time, corporate security was under the legal department, so
> reporting to them was a rather natural and expected occurrence.) One
> of the reasons that legal endorsed customized questions rather only
> canned questions in the first place was to shift liability back over
> to the users. They didn't want a user to sue on grounds that all of
> the questions used for password reset were lame and all the possible
> answers only had either a limited answer space (I mean, how many
> sports teams are there, really?) or one answers that were relatively
> easy to research (e.g., what street someone grew up on or where
> someone was born). That was true of many of the canned questions the
> development teams were using.
> 
> when legal was made aware that customers were using personal questions
> with answers involving confidential data, they also had developers
> give "help" in the form of some advice of examples of bad questions to
> avoid. However, they also wanted to make it impossible that any rogue
> insiders could not sneak a peak at the security questions and answers
> by decrypting them. The also obviously were concerned about an
> external data breach via SQLi, etc.
> 
> Thus legal wanted development teams to follow due diligence and best
> security practice and treat customized security questions and answers
> as restricted data, just like passwords.
> 
> Obviously we could not hash the (custom) security questions, so those
> were encrypted, but the answers to the custom security questions had
> to be hashed with a salt, just like passwords. (IIRC, the canned
> answers didn't _have_ to be hashed, but all the dev teams just hashed
> and salted them as well just so they could use uniform code to handle
> them all.)
> 
> Anyhow, long story short, the UX problems that you mentioned about
> hashing certainly still exist, but in practice don't seem to occur
> very often, and it made legal happy. Sure, for Qs like "What's my
> SSN?", an attacker could hash (for example) all conceivable SSNs along
> with the salt, but at least it made legal happy.
> 
> Now IANAL, but the legal department that I worked for at least thought
> hashing security the answers to _custom_ security questions would
> limit liability. Maybe it does, maybe it doesn’t

Thanks for the detailed response. I think there is a very big problem here. As far as I can tell there are no published standards for storing responses to security questions, nor for storing user generated questions. As we discussed, the methods used for passwords are not necessarily applicable because the space of likely answers to most canned questions is far smaller than the space of passwords. From what you say about there being few problems with hashing user-generated questions, my guess is that people mostly choose questions with short answers. Perhaps they have been trained by unforgiving password reset systems that demand exactly identical answers. 

It seems to me there are several ways to improve the system. There are two problem areas: security, and user friendliness. 

Security
For security, it is clear that a simple salted hash is worthless for protecting most security answers, no matter what your lawyers believe.

1. One improvement is to use a resource-consuming key derivation function to slow down attacks. NIST draft SP-800-63B recommends such an approach suggesting PBKDF2 with a minimum of 10,000 repetitions. More would be needed to protect security answers I would prefer a memory intensive algorithm, such as scrypt or, better, argon2, the winner of the recent password hashing competition. Account creation and reset are much less frequent events that logins and so consuming, say, a second of server time to protect security answers does not seem unreasonable.

2. Incorporating a corporate-wide secret key, along with individual salt, to hash the security answers. Protecting that secret is still a problem, but at least it would add some security.

3. Encrypt the hashed answer, along with a prepended random pad, using a public key and then only store the encrypted hash value. Employ a hardware security module to decrypt the hash when needed, ideally attached to a dedicated serve that just does password re-sets, with limited connections to the corporate intranet.  This method might be strong enough to usefully encrypt answers to multiple choice questions.

4. Use a similar approach as 3 to protect user-supplied questions. The questions are about as sensitive as the answers, since knowing the question is likely to limit the space of answers to be searched.

Usability

5. Canonicalization, as we discussed, could help a little with complex answers.

6. Use a similar approach as 3 to protect user-supplied answers. I understand your lawyers do not want to take responsibility for protection user answers, but other organizations’ lawyers might see the risk in discouraging more secure (i.e. lengthy) answers.

Arnold Reinhold