Categorized | News

Understanding Classic SoundEx Algorithms

Posted on 07 January 2004 by Demian Turner

Search Names & Phrases Based on Phonetic Similarity:

Terms that are often misspelled can be a problem for database designers. Names, for example, are variable length, can have strange spellings, and they are not unique. American names have a diversity of ethnic origins, which give us names pronounced the same way but spelled differently and vice versa.

Words can be misspelled or have multiple spellings, especially across different cultures or national sources.

To solve this problem, we need phonetic algorithms which can find similar sounding terms and names. Just such a family of algorithms exist and are called SoundExes, after the first patented version.

A Soundex search algorithm takes a word, such as a person’s name, as input and produces a character string which identifies a set of words that are (roughly) phonetically alike. It is very handy for searching large databases when the user has incomplete data.

The original Soundex algorithm was patented by Margaret O’Dell and Robert C. Russell in 1918. The method is based on the six phonetic classifications of human speech sounds (bilabial, labiodental, dental, alveolar, velar, and glottal), which in turn are based on where you put your lips and tongue to make the sounds.

The algorithm is fairly straight forward to code and requires no backtracking or multiple passes over the input word. In fact, it is so straight forward, I will start by presenting it as an outline. I will continue on to give C, JavaScript, and Perl code as well later.

Great article, learn more about soundex here.

I got a basic spider built the other day and along with Stargeek’s keyword tools mentioned earlier this week, using soundex is a good way to make your searches smarter.

I now work with a small army of search engine experts and had a shock when a colleague told me the percentage of users who use Google in the UK: only 45%!  In plain english that means the majority of web users in this country are lost in the backwash of paid results offered by the likes of MSN, Overture, etc.  The top search term on MSN is ‘www.hotmail.com’ 😉

Bookmark and Share

Leave a Reply

Categories

Books

Demian Turner's currently-reading book recommendations, reviews, favorite quotes, book clubs, book trivia, book lists

Facebook