All teradataml analytic functions accept Features as input so you can retrieve these Features from FeatureStore and use them in the analytic functions.
The following example predicts diabetes for a patient using teradataml analytic function XGBoost.
- Load data into Vantage.
>>> from teradataml import DataFrame, load_example_data >>> load_example_data('dataframe', 'medical_readings')
>>> df = DataFrame('medical_readings') >>> df
record_timestamp glucose blood_pressure insulin diabetes_pedigree_function outcome patient_id 17 2024-04-10 11:10:59.000000 107 74 0 0.254 1 34 2024-04-10 11:10:59.000000 122 78 0 0.512 0 13 2024-04-10 11:10:59.000000 189 60 846 0.398 1 53 2024-04-10 11:10:59.000000 176 90 300 0.467 1 11 2024-04-10 11:10:59.000000 168 74 0 0.537 1 51 2024-04-10 11:10:59.000000 101 50 36 0.526 0 32 2024-04-10 11:10:59.000000 88 58 54 0.267 0 15 2024-04-10 11:10:59.000000 100 0 0 0.484 1 99 2024-04-10 11:10:59.000000 122 90 220 0.325 1 0 2024-04-10 11:10:59.000000 148 72 0 0.627 1
- Group the Features, Entity, and Data Source.
>>> from teradataml import FeatureGroup
>>> fg = FeatureGroup.from_DataFrame( ... name='MedicalReadings', ... df=medical_readings_df, ... entity_columns='patient_id', ... timestamp_col_name='record_timestamp' ) >>>
- Store the components in FeatureStore.
>>> fs = FeatureStore('vfs_v1')
>>> fs.apply(fg) True >>>
- Retrieve those components from FeatureStore.
>>> fg = fs.get_feature_group('MedicalReadings')
- Retrieve the dataset from FeatureStore.
- Use only the readings taken during week 15 for building an ML model.
df = df[df.record_timestamp.week()==15 ]
- Prepare test and train data.
sampled_df = df.sample(frac=[0.7, 0.3]) train_df = sampled_df[sampled_df.sampleid==2] test_df = sampled_df[sampled_df.sampleid==1]
- Generate a model using Features in FeatureStore.
>>> fg.set_labels('outcome') True
>>> from teradataml import XGBoost
>>> model = XGBoost(data=train_df, ... input_columns=fg.features, ... response_column = fg.labels, ... max_depth=3, ... lambda1 = 1000.0, ... model_type='Classification', ... seed=-1, ... shrinkage_factor=0.1, ... iter_num=2) >>>
- Predict the outcome using test data.
>>> model_out = model.predict(newdata=test_df, ... id_column='patient_id', ... model_type='Classification' ... )
>>> model_out.result
patient_id Prediction Confidence_Lower Confidence_upper 0 17 0 0.5 0.5 1 13 0 0.5 0.5 2 32 0 0.5 0.5 3 40 1 1.0 1.0 4 80 0 1.0 1.0 5 59 0 1.0 1.0 6 38 0 0.5 0.5 7 76 0 1.0 1.0 8 19 0 0.5 0.5 9 34 0 0.5 0.5 >>>
Preprocessing: Store the Features and Data Source in Feature Store
Retrieve components and dataset
Get historic data (ML models only)
Test and train data