ConfusionMatrix Calculated Quantities | Teradata Vantage - ConfusionMatrix Calculated Quantities - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

The following input table has seven observations (n = 7) and three classes (k = 3). The model correctly predicted observations 1, 2, 6, and 7 (number_of_correct_predictions = 4) .

id observed_class predicted_class
1 red red
2 red red
3 red blue
4 red green
5 blue red
6 blue blue
7 green green

Kappa (in StatTable)

To calculate Kappa, the function uses these formulas:
  • Kappa = ( observed_accuracy - random_accuracy) / (1 - random_accuracy)

  • observed_accuracy = number_of_correct_predictions / n
  • random_accuracy = ( Σk (n) (number_of_correct_predictions) ) / (n * n)
For the preceding input table:
  • observed_accuracy = 4/7 = 0.5714
  • random_accuracy = ( (4*3) + (2*2) + (1*2) ) / (7*7) = 18/49 = 0.3673
  • Kappa = ( (4/7) - (18/49) ) / ( 1 - (18/49) ) = 0.3226

Null Error Rate (in StatTable)

The Null Error Rate is the fraction of observations that would be correctly predicted by predicting the most common observed class for all observations.

Formula: Null Error Rate = 1 - ( max (observed_class [, ...]) / n )

Example: In the preceding input table, red is observed four times, blue twice, and green once; therefore:

Null Error Rate = 1 - ( max (4,2,1) / 7 ) = 1 - (4/7) = 0.4286

Values in AccuracyTable Formulas

The formulas for the sensitivity, specificity, prevalence, detection rate, and detection prevalence of class c use these terms:
Term Definition
ccorr Number of correct predictions of class c.
cobs Number of observations of class c.
cpred Number of predictions of class c.

Sensitivity (in AccuracyTable)

Formula for sensitivity of class c:

sensitivity (c) = ccorr / cobs

Example: sensitivity (red) = 2/4 = 0.5

Specificity (in AccuracyTable)

Formula for specificity of class c:

specificity (c) = ( n + ccorr - cobs - cpred ) / ( n - cobs )

Example: specificity (red) = (7 + 2 - 4 -3 ) / (7 - 4) = 0.6667

Prevalence (in AccuracyTable)

If you specify Prevalence, the function uses the prevalence specified for each class; otherwise, the function calculates the prevalence of class c with this formula:

prevalence (c) = cobs / n

Example: prevalence (red) = 4/7 = 0.5714

Pos Pred Value (in AccuracyTable)

Formula for Pos Pred Value (positive prediction value or PPV) of class c:

PPV (c) =

( sensitivity (c) * prevalence (c) ) /

( ( sensitivity (c) * prevalence (c) ) + (1 - specificity (c) ) * (1 - prevalence (c) ) )

Example: PPV (red) = (0.5 * 0.5714) / ( (0.5 * 0.5714) + (1 - 0.6667) * (1 - 0.5714) ) = 0.6667

Neg Pred Value (in AccuracyTable)

Formula for Neg Pred Value (negative prediction value or NPV) of class c:

NPV (c) =

( specificity (c) * (1 - prevalence (c) ) ) /

( specificity (c) * (1 - prevalence (c) ) + (1 - sensitivity (c) ) * (1 - prevalence (c) ) )

Example: NPV (red) = ( ( 0.6667 * (1 - 0.5714) ) ) / ( 0.6667 * (1 - 0.5714) + (1 - 0.5) * (0.5714) ) = 0.5

Detection Rate (in AccuracyTable)

Formula for Detection Rate of class c:

Detection Rate (c) = ccorr / n

Example: Detection Rate (red) = 2/7 = 0.2857

Detection Prevalence (in AccuracyTable)

Formula for Detection Prevalence of class c:

Detection Prevalence (c) = cpred / n

Example: Detection Rate (red) = 3/7 = 0.4286

Balanced Accuracy (in AccuracyTable)

Formula for Balanced Accuracy of class c:

Balanced Accuracy (c) = ( sensitivity (c) + specificity (c) ) / 2

Example: Balanced Accuracy (red) = (0.5 + 0.6667) / 2 = 0.5833