e-gold and e-go1d

Sat Nov 29 15:51:18 EST 2008

On Nov 29, 2008, at 9:18 AM, James A. Donald wrote:
> The algorithm is to map all lookalike glyphs to
> canonical glyphs

The definition of lookalike glyphs depends on the choice of font and  
variant, and Unicode wraps the whole problem in a lovely layer of  
hell. If I had to do this, I'd investigate rendering both strings in  
the (same) target font and then quantifying the amount of overlap in  
the bitmaps, as e.g. SWORD does for TLDs:

     <http://icann.sword-group.com/icann-algorithm/Default.aspx>

The above is proprietary; NIST's Paul Black has Python code available  
for a slightly enhanced Levenshtein distance:

     <http://hissa.nist.gov/~black/GTLD/>

--
Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | http://radian.org

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com