Soundex vs. Phonetic

When you define the parameter value for the CUS DeDup application parameter, you have two options: Soundex and Phonetic. These options are methods by which to establish duplicate criteria when you run the CUS590 process. Soundex is less restrictive and will produce more potential duplicates, whereas Phonetic is more restrictive and will produce less potential duplicates.

 

The de-duping process (CUS590) uses Soundex or Phonetic to retrieve groups of customers who have similar names. Then, other comparisons are made against the customers in that group (state, gender, SSN, etc.) to see if they are a potential match. Soundex and Phonetic do not determine duplicates; they group the records with similarly spelled names together so that they can be further compared.

 

Some sample groups include:

Name

Soundex

Phonetic

Washington

W252

XNKTN9999999

Smith

S530

SM0999999999

Smyth

S530

SM0999999999

Jones

J520

JNS999999999

Lee

L000

L99999999999

Hernandez

H655

HRNNTZ999999

Hernandaz

H655

HRNNTZ999999

Soundex

Soundex is a phonetic coding used to group similar English names. A Soundex code consists of a letter and 3 numbers. The letter is the first letter of the name. The numbers represent letters that are pronounced similarly. Vowels and the letters H, W, and Y are excluded, and double letters are only coded once. The numbers are as follows:

1 - B, F, P, V

2 - C, G, J, K, Q, S, X

3 - D, T

4 - L

5 - M, N

6 - R

If you run out of letters before getting three numbers, the rest are filled in with zeros.

There is a Soundex function in both Oracle and SQL Server, and you can go to http://www.highprogrammer.com/cgi-bin/uniqueid/c_soundex to get the Soundex of a name. Soundex should not be used with first names because nicknames are often substituted for first names that start with different letters (such as Bob and Robert), and the first character is so important in the Soundex of name.

 

The following websites can help you when you find people with similar last names:

·            http://www.census.gov/genealogy/www/namesearch.html – Useful when you have two spellings of a name and are unsure which is correct. You can search the 1990 Census for last names here to see which name is more common.

·            http://www.census.gov/genealogy/names/names_files.html – Lists all last names by frequency. If you have a very large customer database, you can de-dup a particular last name (i.e., Smith, Johnson, Williams, Jones, Brown, Davis, Miller). Since the Soundex code would be the same for all these customers, that would be your group.

·            http://www.anywho.com - Look up people based upon address, or perform a reverse lookup on a phone number.

Phonetic

Phonetic is a function of Soundex that is more restrictive. Personify created this method as an alternative to Soundex due to the four character key limitation of Soundex. This GetPhoneticKey would return 12 keys.

See also:

·            Duplicate Customers Overview

·            Preventing Duplicate Customers

·            What is Merged

·            What is Not Merged

·            Identifying Potential Duplicate Customers

·            Duplicate Customers on the Web