password-cracking by journalists...
Arnold G. Reinhold
reinhold at world.std.com
Sun Jan 20 21:58:17 EST 2002
At 7:38 PM -0500 1/19/02, Steven M. Bellovin wrote:
>In message
><Pine.SOL.4.30.0201200101340.17593-100000 at kruuna.Helsinki.FI>, Sampo
> Syreeni writes:
>>On Thu, 17 Jan 2002, Steven M. Bellovin wrote:
>>
>>>For one thing, in Hebrew (and, I think, Arabic) vowels are not normally
>>>written.
>>
>>If something, this would lead me to believe there is less redundancy in
>>what *is* written, and so less possibility for a dictionary attack.
>>
>>>Also, there are a few Hebrew letters which have different forms when
>>>they're the final letter in a word -- my understanding is that there are
>>>more Arabic letters that have a different final form, and that some have
>>>up to four forms: one initial, two middle, and one final.
>>
>>At least Unicode codes these as the same codepoint, and treats the
>>different forms as glyph variants. Normalizing for these before the attack
> >shouldn't be a big deal.
Arabic Unicode is based on ISO 8859/6 so this was presumably the case
before Unicode as well.
> >
>>>Finally, Hebrew (and, as someone else mentioned, Arabic) verbs have a
>>>three-letter root form; many nouns are derived from this root.
>>
>>This would facilitate the attack, especially if the root form is all that
>>is written -- it would lead us expect shorter passwords and a densely
>>populated search space, with less possibility for easy variations like
>>punctuation.
>>
>
I'm not sure why someone would only write the root. I don't think
it's any more natural for speaker of those languages than writing
Latin roots would be for English speakers.
>Right -- there are factors pushing in both directions, and I don't know
>how it balances.
A few more factors:
1. Neither Hebrew nor Arabic have capitalization the way Latin does.
This reduces opportunities for variation. The Hebrew final forms make
up for that to a small degree. They are treated as different code
points in all encodings*, by the way.
2. Almost all Hebrew encodings* include the Latin letters as well.
In 7-bit ASCII Hebrew, the Hebrew alphabet replaces the lowercase
Latin letters. In IBM-PC and ISO 8859/8 encodings, the Hebrew
alphabet is in the upper 128 characters, with the lower 128 printable
characters being standard ASCII. So a Hebrew user could mix Latin and
Hebrew characters if they wished. I suspect most Arabic computer
users have easy access to Latin characters too.
3. Arabic and Hebrew users might be counseled to selectively use
vowels or diacritical marks in their passwords.
4. People outside the U.S. are less likely to be mono-lingual.
Someone from Israel for example might be expected to know several
languages among Hebrew, Arabic, Aramaic, English, Russian, Yiddish
and Ladino.
5. Unicode includes an extended Arabic-encoding with 96 additional
letter/diacritic forms used in non-Arabic languages that use Arabic
alphabet, including 9 for Pashto. I don't know if these are available
in consumer PC's yet.
6. Finally users of these or other non-Latin alphabet languages might
well choose to transliterate their password into Latin characters to
make them easy to enter on any computer.
>
>Your mention of Unicode, though, brings up another point: the encoding
>that's used can matter, too. If UCS-2 or UCS-4 (16 and 31-bit
>encodings) are used, I believe that there are many constant bits per
>character. Even UTF-8 would have that effect.
>
I think the analysis depends on the type of password system employed.
In a properly designed system that places no restriction on password
length and applies a cryptographic hash to the password input + ample
salt, the existence of constant bits per character in some encodings
has no effect. The entropy of the password is determined by the
symbol space the user is employing, not the internal encoding.
Systems like these are probably best attacked by trying long lists of
likely passwords, preferably guided by whatever personal information
is known about the password creator.
If the password bit length is limited to a low number, e.g. the Unix
56-bit limit, switching to 16-bit or 32-bit per character encoding
would be disastrous. As far as I know, no one does this. I don't know
if any implementations attempt to accept UTF-8 encoding. There are
clearly some pitfalls there.
On the other hand, the Unix password system, particularly those where
the hashed password can be obtained by an attacker, is so broken that
any natural language password is going to be weak. Random 8
character passwords from a 26 letter alphabet, will only have 38 bits
of entropy. A dictionary attack is quite feasible at that size. A
random password with 6 letters, one digit and one special character
(typical of what users are counseled to choose) has 42 bits. A
random password using the full 96 printable ASCII character set only
gets you to 53 bits of entropy. Stamping out the 8 character Unix
password limit would be a good use of Homeland Defense money.
Arnold Reinhold
*At least all those listed in Narshon and Rosenschein, "The Many
Faces of Hebrew," Kivun Ltd. (a developer of multilingual software),
Jerusalem, 1989.
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com
More information about the cryptography
mailing list