DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Soundex

09.18.2005
| 3491 views |
  • submit to reddit
        From Greg Jorgensen's <a href=http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213>recipe</a>.
def soundex(name, len=4):

    # digits holds the soundex values for the alphabet
    digits = '01230120022455012623010202'
    sndx = ''
    fc = ''

    # translate alpha chars in name to soundex digits
    for c in name.upper():
        if c.isalpha():
            if not fc: fc = c   # remember first letter
            d = digits[ord(c)-ord('A')]
            # duplicate consecutive soundex digits are skipped
            if not sndx or (d != sndx[-1]):
                sndx += d
    
    sndx = fc + sndx[1:]   # replace first digit with first char
    sndx = sndx.replace('0','')       # remove all 0s
    return (sndx + (len * '0'))[:len] # padded to len characters
Similar words will return the same soundex
>>> soundex('hello')
'H400'
>>> soundex('hola')
'H400'