15.00 - Case Study 3 - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Case Study 3

The following table indicates the number of distinct occurrences of City column values.

 

City Value

Frequency of Value

Log Frequency of Value

New York

                   4,000

                            3.602

Los Angeles

                   4,100

                            3.613

Chicago

                   3,800

                            3.580

Denver

                   4,200

                            3.623

Phoenix

                   3,900

                            3.591

Atlanta

                   4,000

                            3.602

Dallas

                   3,800

                            3.580

Boston

                   4,150

                            3.618

Paris

                        30

                            1.477

London

                        30

                            1.477

Tokyo

                        30

                            1.477

Rio de Janeiro

                        30

                            1.477

Moscow

                        30

                            1.477

Mexico City

                        30

                            1.477

Kuala Lumpur

                        30

                            1.477

Sydney

                        30

                            1.477

Brussels

                        30

                            1.477

The following histogram graphs the logarithm of the number of rows as a function of row values:

The maximum value for this set is 4,200, but what is the typical value?

It is impossible to determine an accurate “typical” value for the scenario provided by this case history. Like the scenario presented by “Case Study 2” on page 162, the distribution of values has two peaks at widely diverse points in the distribution. Unlike “Case Study 2” the value set clustered around the value 4,000 is not constant.

When you encounter a situation like this, the optimum solution is to use a value around which the largest values cluster as your typical value. In this case, that value is 4,000.

 

Maximum Value

Typical Value

4,200

4,000