15.00 - SOUNDEX - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)

SOUNDEX

Purpose  

Returns a character string that represents the Soundex code for string_expression.

Syntax  

where:

 

Syntax element …

Specifies …

string_expression

a character string or expression that contains a surname to be evaluated in simple Latin characters.

Soundex is case insensitive.

Embedded or trailing pad characters within character_string return an error to the requestor.

ANSI Compliance

This is a Teradata extension to the ANSI SQL:2011 standard.

Argument Types

Use SOUNDEX on character strings or character string expressions that use the LATIN or UNICODE server character set.

SOUNDEX does not accept CLOB types.

By default, Teradata Database performs implicit type conversion on UDT arguments that have implicit casts to predefined character types.

To define an implicit cast for a UDT, use the CREATE CAST statement and specify the AS ASSIGNMENT clause. For more information on CREATE CAST, see SQL Data Definition Language.

Implicit type conversion of UDTs for system operators and functions, including SOUNDEX, is a Teradata extension to the ANSI SQL standard. To disable this extension, set the DisableUDTImplCastForSysFuncOp field of the DBS Control Record to TRUE. For details, see Utilities: Volume 1 (A-K).

For more information on implicit type conversion of UDTs, see Chapter 13: “Data Type Conversions.”

Definition: Simple Latin Characters

A simple Latin character is one that does not have diacritical marks such as tilde (~) or acute accent (´).

There are 26 uppercase simple Latin characters and 26 lowercase simple Latin characters.

Definition: Soundex

Soundex is a system that codes surnames having the same or similar sounds, but variant spellings. The Soundex system was first used by the National Archives in 1880 to index the United States census.

Soundex codes begin with the first letter of the surname followed by a three‑digit code. Zeros are added to names that do not have enough letters.

Soundex Coding Guide

The following process outlines the Soundex coding guide:

1 Retain the first letter of the name.

2 Drop all occurrences of the following letters:

A, E, I, O, U, Y, H, W

in other positions.

3 Assign the following number to the remaining letters after the first letter:

1 = B, F, P, V

2 = C, G, J, K, Q, S X, Z

3 = D, T

4 = L

5 = M, N

6 = R

4 If two or more letters with the same code are adjacent in the original name or adjacent except for any intervening H or W, omit all but the first.

5 Convert the form “letter, digit, digit, digit,” by adding trailing zeros if less than three digits.

6 Drop the rightmost digits if more than three digits.

7 Names with adjacent letters having the same equivalent number are coded as one letter with a single number

Surname prefixes are generally not used.

Example  

The following SELECT statement returns the result that follows.

   SELECT SOUNDEX ('ashcraft');
   
   Soundex('ashcraft')
   -------------------
   a261

The surname “ashcraft” initially evaluates to “a2h2613,” but the following Soundex rules convert the result to a261.

  • “h” is dropped because it occurs in the third position. Soundex drops all occurrences of the following characters in any position other than the first.
  • A, E, I, O, U, Y, H, W

  • “2” is dropped because it represents the second occurrence of one of the following characters:
  • C, G, J, K, Q, S X, Z

    If two or more characters with the same code are adjacent in the original name, or adjacent except for any intervening H or W, Soundex omits all but the code for the first occurrence of the character in the returned code.

  • “3” is dropped because Soundex drops the rightmost digits if character_string evaluates to more than three digits following the initial simple Latin character.
  • Example  

    “Example 2” and “Example 3” on page 1241 use the following table data:

       SELECT family_name FROM family;
       
       family_name
       -----------
       John
       Joan
       Joey
       joanne
       michael
       Bob

    Here are the results of the SOUNDEX function on the data in the family_name column:

       SELECT SOUNDEX(TRIM(family.family_name));
       
       Soundex(TRIM(BOTH FROM family_name))
       ------------------------------------
       J500
       J500
       B100
       J000
       m240
       j500

    Example  

    Find all family names in Family that sound like “Joan”.

       SELECT family_name
       FROM family
       WHERE SOUNDEX(TRIM(family.family_name)) = SOUNDEX('Joan');
       
       family_name
       -----------
       John
       Joan
       Joanne

    Examples of Non Valid Usage

    The following SOUNDEX examples are not valid for the reasons given in the table.

     

    Statement

    Why the Statement is Not Valid

    SELECT SOUNDEX(12345);

    12345 is a numeric string, not a character string.

    SELECT SOUNDEX('ábç');

    The characters á and ç are not simple Latin characters.