1.1 - 8.10 - ConfusionMatrix Calculated Quantities - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

The following input table has seven observations (n = 7) and three classes (k = 3). The model correctly predicted observations 1, 2, 6, and 7 (number_of_correct_predictions = 4) .

id observed_class predicted_class
1 red red
2 red red
3 red blue
4 red green
5 blue red
6 blue blue
7 green green

Kappa (in StatTable)

To calculate Kappa, the function uses these formulas:
  • Kappa = ( observed_accuracy - random_accuracy) / (1 - random_accuracy)

  • observed_accuracy = number_of_correct_predictions / n
  • random_accuracy = ( Σ k (n) (number_of_correct_predictions) ) / (n * n)
For the preceding input table:
  • observed_accuracy = 4/7 = 0.5714
  • random_accuracy = ( (4*3) + (2*2) + (1*2) ) / (7*7) = 18/49 = 0.3673
  • Kappa = ( (4/7) - (18/49) ) / ( 1 - (18/49) ) = 0.3226

Null Error Rate (in StatTable)

The Null Error Rate is the fraction of observations that would be correctly predicted by predicting the most common observed class for all observations.

Formula: Null Error Rate = 1 - ( max (observed_class [, ...]) / n )

Example: In the preceding input table, red is observed four times, blue twice, and green once; therefore:

Null Error Rate = 1 - ( max (4,2,1) / 7 ) = 1 - (4/7) = 0.4286

Values in AccuracyTable Formulas

The formulas for the sensitivity, specificity, prevalence, detection rate, and detection prevalence of class c use these terms:
Term Definition
c corr Number of correct predictions of class c.
c obs Number of observations of class c.
c pred Number of predictions of class c.

Sensitivity (in AccuracyTable)

Formula for sensitivity of class c:

sensitivity (c) = c corr / c obs

Example: sensitivity (red) = 2/4 = 0.5

Specificity (in AccuracyTable)

Formula for specificity of class c:

specificity (c) = ( n + c corr - c obs - c pred ) / ( n - c obs )

Example: specificity (red) = (7 + 2 - 4 -3 ) / (7 - 4) = 0.6667

Prevalence (in AccuracyTable)

If you specify Prevalence, the function uses the prevalence specified for each class; otherwise, the function calculates the prevalence of class c with this formula:

prevalence (c) = c obs / n

Example: prevalence (red) = 4/7 = 0.5714

Pos Pred Value (in AccuracyTable)

Formula for Pos Pred Value (positive prediction value or PPV) of class c:

PPV (c) =

( sensitivity (c) * prevalence (c) ) /

( ( sensitivity (c) * prevalence (c) ) + (1 - specificity (c) ) * (1 - prevalence (c) ) )

Example: PPV (red) = (0.5 * 0.5714) / ( (0.5 * 0.5714) + (1 - 0.6667) * (1 - 0.5714) ) = 0.6667

Neg Pred Value (in AccuracyTable)

Formula for Neg Pred Value (negative prediction value or NPV) of class c:

NPV (c) =

( specificity (c) * (1 - prevalence (c) ) ) /

( specificity (c) * (1 - prevalence (c) ) + (1 - sensitivity (c) ) * (1 - prevalence (c) ) )

Example: NPV (red) = ( ( 0.6667 * (1 - 0.5714) ) ) / ( 0.6667 * (1 - 0.5714) + (1 - 0.5) * (0.5714) ) = 0.5

Detection Rate (in AccuracyTable)

Formula for Detection Rate of class c:

Detection Rate (c) = c corr / n

Example: Detection Rate (red) = 2/7 = 0.2857

Detection Prevalence (in AccuracyTable)

Formula for Detection Prevalence of class c:

Detection Prevalence (c) = c pred / n

Example: Detection Rate (red) = 3/7 = 0.4286

Balanced Accuracy (in AccuracyTable)

Formula for Balanced Accuracy of class c:

Balanced Accuracy (c) = ( sensitivity (c) + specificity (c) ) / 2

Example: Balanced Accuracy (red) = (0.5 + 0.6667) / 2 = 0.5833