Using FeatureStore in teradataml Analytic Functions | Teradata Package for Python - Using FeatureStore in teradataml Analytic Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
All teradataml analytic functions accept Features as input so you can retrieve these Features from FeatureStore and use them in the analytic functions.

The following example predicts diabetes for a patient using teradataml analytic function XGBoost.

    Preprocessing: Store the Features and Data Source in Feature Store

  1. Load data into Vantage.
    >>> from teradataml import DataFrame, load_example_data
    >>> load_example_data('dataframe', 'medical_readings')
    >>> df = DataFrame('medical_readings')
    >>> df
                          record_timestamp  glucose  blood_pressure  insulin  diabetes_pedigree_function  outcome
    patient_id                                                                                                   
    17          2024-04-10 11:10:59.000000      107              74        0                       0.254        1
    34          2024-04-10 11:10:59.000000      122              78        0                       0.512        0
    13          2024-04-10 11:10:59.000000      189              60      846                       0.398        1
    53          2024-04-10 11:10:59.000000      176              90      300                       0.467        1
    11          2024-04-10 11:10:59.000000      168              74        0                       0.537        1
    51          2024-04-10 11:10:59.000000      101              50       36                       0.526        0
    32          2024-04-10 11:10:59.000000       88              58       54                       0.267        0
    15          2024-04-10 11:10:59.000000      100               0        0                       0.484        1
    99          2024-04-10 11:10:59.000000      122              90      220                       0.325        1
    0           2024-04-10 11:10:59.000000      148              72        0                       0.627        1
  2. Group the Features, Entity, and Data Source.
    >>> from teradataml import FeatureGroup
    >>> fg = FeatureGroup.from_DataFrame(
    ...    name='MedicalReadings', 
    ...    df=medical_readings_df, 
    ...    entity_columns='patient_id', 
    ...    timestamp_col_name='record_timestamp'
    )
    >>> 
  3. Store the components in FeatureStore.
    >>> fs = FeatureStore('vfs_v1')
    >>> fs.apply(fg)
    True
    >>>
  4. Retrieve components and dataset

  5. Retrieve those components from FeatureStore.
    >>> fg = fs.get_feature_group('MedicalReadings')
  6. Retrieve the dataset from FeatureStore.
  7. Get historic data (ML models only)

  8. Use only the readings taken during week 15 for building an ML model.
    df = df[df.record_timestamp.week()==15  ]
  9. Test and train data

  10. Prepare test and train data.
    sampled_df = df.sample(frac=[0.7, 0.3])
    train_df = sampled_df[sampled_df.sampleid==2]
    test_df = sampled_df[sampled_df.sampleid==1]
  11. Generate a model using Features in FeatureStore.
    >>> fg.set_labels('outcome')
    True
    >>> from teradataml import XGBoost
    >>> model = XGBoost(data=train_df,
    ...                 input_columns=fg.features,
    ...                 response_column = fg.labels,
    ...                 max_depth=3,
    ...                 lambda1 = 1000.0,
    ...                 model_type='Classification',
    ...                 seed=-1,
    ...                 shrinkage_factor=0.1,
    ...                 iter_num=2)
    >>>
  12. Predict the outcome using test data.
    >>> model_out = model.predict(newdata=test_df,
    ...                           id_column='patient_id',
    ...                           model_type='Classification'
    ...                           )
    >>> model_out.result
       patient_id  Prediction  Confidence_Lower  Confidence_upper
    0          17           0               0.5               0.5
    1          13           0               0.5               0.5
    2          32           0               0.5               0.5
    3          40           1               1.0               1.0
    4          80           0               1.0               1.0
    5          59           0               1.0               1.0
    6          38           0               0.5               0.5
    7          76           0               1.0               1.0
    8          19           0               0.5               0.5
    9          34           0               0.5               0.5
    >>>