1.0 - 8.00 - CoxPH Example - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Input

The input table, lungcancer, contains data from a randomized trial of two treatment regimens for lung cancer used to model survival analysis. There are three categorical predictors and three numerical predictors:

Predictors
Predictor Description Possible Values
trt Treatment plan (categorical)
  • standard
  • test
celltype Cancerous cell type (categorical)
  • squamous
  • smallcell
  • adeno
  • large
prior Whether the patient has undergone prior therapy (categorical)
  • yes
  • no
karno Karnofsky score assigned by patient (numerical) [0, 100], where 100 is perfect health and 0 is death
diagtime Months from diagnosis to randomization (numerical) Nonnegative number
age Patient age, in years (numerical) Nonnegative number

In addition to a column for each predictor, the input table has these columns:

Column Description Possible Values
id Patient identifier Positive integer
status Censoring status or survival event
  • 0 (survival/right censorship)
  • 1
time_int Survival time in months Nonnegative number
lungcancer
id trt celltype time_int status karno diagtime age prior
1 standard squamous 72 1 60 7 69 no
2 standard squamous 411 1 70 5 64 yes
3 standard squamous 228 1 60 3 38 no
4 standard squamous 126 1 60 9 63 yes
5 standard squamous 118 1 70 11 65 yes
6 standard squamous 10 1 20 5 49 no
7 standard squamous 82 1 40 10 69 yes
8 standard squamous 110 1 80 29 68 no
9 standard squamous 314 1 50 18 43 no
10 standard squamous 100 0 70 6 70 no
... ... ... ... ... ... ... ... ...

SQL Call

SELECT * FROM CoxPH (
  ON lungcancer AS InputTable
  OUT TABLE CoefficientTable (lungcancer_coef)
  OUT TABLE LinearPredictorTable (lungcancer_lp)
  USING
  FeatureColumns ('trt', 'celltype', 'karno', 'diagtime', 'age', 'prior')
  CategoricalColumns ('trt','celltype','prior')
  TimeIntervalColumn ('time_int')
  EventColumn ('status')
) AS dt;

Output

Coefficients are estimated at 95% CI. Coefficients of variables karno, squamous, and large celltype are significant.

Summary Table
predictor category coefficient exp_coef std_error std_error z_score p_value significance
karno   -0.032815 0.967717 0.005508 0.005508 -5.95802 0 ***
diagtime   8.1e-05 1.000081 0.009136 0.009136 0.008901 0.992898  
age   -0.008706 0.991331 0.0093 0.0093 -0.93615 0.349196  
trt standard 0 1 0 0      
trt test 0.294603 1.342593 0.20755 0.20755 1.419433 0.155773  
celltype adeno 0 1 0 0      
celltype large -0.794775 0.451683 0.302878 0.302878 -2.624078 0.008688 **
celltype smallcell -0.334506 0.715692 0.275978 0.275978 -1.212075 0.225483  
celltype squamous -1.196066 0.302381 0.300917 0.300917 -3.974739 7e-05 ***
prior no 0 1 0 0      
prior yes 0.071594 1.074219 0.232305 0.232305 0.308187 0.75794  
Iteration #   5           yes
Convergence             0 on 8 degree of freedom
Likelihood ratio test   62.1039         0 on 8 degree of freedom
Wald test   62.3673         0 on 8 degree of freedom
Score test   66.7375     0.005508 -5.95802 0 ***

The coefficients are output in the table lungcancer_coef, which is later used for prediction. Because celltype, trt and prior are categorical variables, one of their categories is considered a reference for the other categories; thus trt = standard, celltype = adeno, and prior = no don not show default coefficient values in each column.

This query returns the following table:

SELECT * FROM lungcancer_coef ORDER BY 1;
lungcancer_coef
id predictor category coefficient exp_coef std_error z_score p_value significance
1 karno   -0.0328153261941663 0.967717255116806 0.00550775688646227 -5.95802009250341 2.55312138097707e-09 ***
2 diagtime   8.13205087074416e-05 1.00008132381531 0.00913606224777197 0.00890104582280767 0.992898086742234  
3 age   -0.00870647494549903 0.991331316650765 0.00930029912031493 -0.936149991829963 0.349195966726043  
4 trt standard 0 1 0 NaN NaN  
5 trt test 0.294602821498042 1.34259300369677 0.207549603603519 1.41943331320844 0.155772725980423  
6 celltype adeno 0 1 0 NaN NaN  
7 celltype large -0.794774719851903 0.451682978670067 0.30287771543449 -2.62407790124726 0.00868839104930008 **
8 celltype smallcell -0.334505911425932 0.71569161405743 0.275977786191144 -1.21207549362053 0.225483483597614  
9 celltype squamous -1.19606637417932 0.302381330550752 0.300916994493076 -3.974738536101 7.04566159376308e-05 ***
10 prior no 0 1 0 NaN NaN  
11 prior yes 0.0715936019179389 1.07421869492581 0.232305384067305 0.308187441308705 0.757939708088376  

This query returns the following table:

SELECT * FROM lungcancer_lp ORDER BY 1;
lungcancer_lp
linear_predictor event time_internal
-4.41189466565077 1 467
-4.41097447125404 1 110
-4.39448171575977 1 389
-4.29871049135928 1 283
-4.28779288293039 0 182
-4.26989200998715 1 143
-4.25242310919077 1 999
-4.20170368038227 0 25
-4.10210453090365 0 100
-4.04859022189228 1 112
... ... ...