password-cracking by journalists...

Sun Jan 20 21:58:17 EST 2002

At 7:38 PM -0500 1/19/02, Steven M. Bellovin wrote:
>In message 
><Pine.SOL.4.30.0201200101340.17593-100000 at kruuna.Helsinki.FI>, Sampo
> Syreeni writes:
>>On Thu, 17 Jan 2002, Steven M. Bellovin wrote:
>>
>>>For one thing, in Hebrew (and, I think, Arabic) vowels are not normally
>>>written.
>>
>>If something, this would lead me to believe there is less redundancy in
>>what *is* written, and so less possibility for a dictionary attack.
>>
>>>Also, there are a few Hebrew letters which have different forms when
>>>they're the final letter in a word -- my understanding is that there are
>>>more Arabic letters that have a different final form, and that some have
>>>up to four forms: one initial, two middle, and one final.
>>
>>At least Unicode codes these as the same codepoint, and treats the
>>different forms as glyph variants. Normalizing for these before the attack
> >shouldn't be a big deal.

Arabic Unicode is based on ISO 8859/6 so this was presumably the case 
before Unicode as well.

> >
>>>Finally, Hebrew (and, as someone else mentioned, Arabic) verbs have a
>>>three-letter root form; many nouns are derived from this root.
>>
>>This would facilitate the attack, especially if the root form is all that
>>is written -- it would lead us expect shorter passwords and a densely
>>populated search space, with less possibility for easy variations like
>>punctuation.
>>
>

I'm not sure why someone would only write the root. I don't think 
it's any more natural for speaker of those languages than writing 
Latin roots would be for English speakers.

>Right -- there are factors pushing in both directions, and I don't know
>how it balances.

A few more factors:

1. Neither Hebrew nor Arabic have capitalization the way Latin does. 
This reduces opportunities for variation. The Hebrew final forms make 
up for that to a small degree.  They are treated as different code 
points in all encodings*, by the way.

2. Almost all Hebrew encodings* include the Latin letters as well. 
In 7-bit ASCII Hebrew, the Hebrew alphabet replaces the lowercase 
Latin letters. In IBM-PC and ISO 8859/8  encodings, the Hebrew 
alphabet is in the upper 128 characters, with the lower 128 printable 
characters being standard ASCII. So a Hebrew user could mix Latin and 
Hebrew characters if they wished.  I suspect most Arabic computer 
users have easy access to Latin characters too.

3. Arabic and Hebrew users might be counseled to selectively use 
vowels or diacritical marks in their passwords.

4. People outside the U.S. are less likely to be mono-lingual. 
Someone from Israel for example might be expected to know several 
languages among Hebrew, Arabic, Aramaic, English, Russian, Yiddish 
and Ladino.

5. Unicode includes an extended Arabic-encoding with 96 additional 
letter/diacritic forms used in non-Arabic languages that use Arabic 
alphabet, including 9 for Pashto. I don't know if these are available 
in consumer PC's yet.

6. Finally users of these or other non-Latin alphabet languages might 
well choose to transliterate their password into Latin characters to 
make them easy to enter on any computer.

>
>Your mention of Unicode, though, brings up another point:  the encoding
>that's used can matter, too.  If UCS-2 or UCS-4 (16 and 31-bit
>encodings) are used, I believe that there are many constant bits per
>character.  Even UTF-8 would have that effect.
>

I think the analysis depends on the type of password system employed. 
In a properly designed system that places no restriction on password 
length and applies a cryptographic hash to the password input + ample 
salt, the existence of constant bits per character in some encodings 
has no effect. The entropy of the password is determined by the 
symbol space the user is employing, not the internal encoding.

Systems like these are probably best attacked by trying long lists of 
likely passwords, preferably guided by whatever personal information 
is known about the password creator.

If the password bit length is limited to a low number, e.g. the Unix 
56-bit limit,  switching to 16-bit or 32-bit per character encoding 
would be disastrous. As far as I know, no one does this. I don't know 
if any implementations attempt to accept UTF-8 encoding. There are 
clearly some pitfalls there.

On the other hand, the Unix password system, particularly those where 
the hashed password can be obtained by an attacker, is so broken that 
any natural language password is going to be weak.  Random 8 
character passwords from a 26 letter alphabet, will only have 38 bits 
of entropy.  A dictionary attack is quite feasible at that size. A 
random password with 6 letters, one digit and one special character 
(typical of what users are counseled to choose) has 42 bits.  A 
random password using the full 96 printable ASCII character set only 
gets you to 53 bits of entropy. Stamping out the 8 character Unix 
password limit would be a good use of Homeland Defense money.

Arnold Reinhold

*At least all those listed in Narshon and Rosenschein, "The Many 
Faces of Hebrew," Kivun Ltd. (a developer of multilingual software), 
Jerusalem, 1989.

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com