Get dataset from FeatureStore | Teradata Package for Python - Get dataset from FeatureStore - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

get_dataset

Use get_dataset() to get teradataml DataFrame from the group name.

Feed historic data to your ML models

You can feed only a portion of a dataset to ML model by using teradataml filter options. get_dataset() returns a teradataml DataFrame using Data Source for Feature Group. Once you get the teradataml DataFrame, you can use various filter options provided by teradataml to filter only specific records and then feed the filtered DataFrame to your ML model.

Example 1: Get dataset for 'sales' group

>>> fs = FeatureStore("vfs_v1")
>>> patient_profile_fg = FeatureGroup.from_DataFrame(
...    name='PatientProfile', 
...    df=patient_profile_df, 
...    entity_columns='patient_id', 
...    timestamp_col_name='record_timestamp'
)
>>> fs.apply(patient_profile_fg)
True
>>> df = fs.get_dataset(patient_profile_fg.name)
>>> df
                      record_timestamp   bmi  age  skin_thickness  pregnancies
patient_id                                                                    
19          2024-04-10 11:10:59.000000  34.6   32            30.0            1
59          2024-04-10 11:10:59.000000  41.5   22            41.0            0
38          2024-04-10 11:10:59.000000  38.2   27            42.0            2
78          2024-04-10 11:10:59.000000  43.2   26             0.0            0
36          2024-04-10 11:10:59.000000  33.2   35             0.0           11
97          2024-04-10 11:10:59.000000  20.4   22            18.0            1
57          2024-04-10 11:10:59.000000  46.8   31            60.0            0
80          2024-04-10 11:10:59.000000  22.4   22            13.0            3
40          2024-04-10 11:10:59.000000  34.0   26            25.0            3
61          2024-04-10 11:10:59.000000  32.9   39             0.0            8

Example 2: Combine two Feature Groups and get the dataset for the combined Feature Group

>>> patient_profile_fg = FeatureGroup.from_DataFrame(
...    name='PatientProfile', 
...    df=patient_profile_df, 
...    entity_columns='patient_id', 
...    timestamp_col_name='record_timestamp'
)
>>> fs.apply(patient_profile_fg)
True
>>> medical_readings_fg = FeatureGroup.from_DataFrame(
...    name='MedicalReadings', 
...    df=medical_readings_df, 
...    entity_columns='patient_id', 
...    timestamp_col_name='record_timestamp'
)
>>> combined_fg = patient_profile_fg + medical_readings_fg 
>>> fs.apply(combined_fg)
>>> fs.get_dataset(combined_fg.name)
   patient_id            record_timestamp  outcome  age   bmi  skin_thickness  diabetes_pedigree_function  blood_pressure  insulin  glucose  pregnancies
0          17  2024-04-10 11:10:59.000000        1   31  29.6             0.0                       0.254              74        0      107            7
1          34  2024-04-10 11:10:59.000000        0   45  27.6            31.0                       0.512              78        0      122           10
2          13  2024-04-10 11:10:59.000000        1   59  30.1            23.0                       0.398              60      846      189            1
3          61  2024-04-10 11:10:59.000000        1   39  32.9             0.0                       0.270              72        0      133            8
4          19  2024-04-10 11:10:59.000000        1   32  34.6            30.0                       0.529              70       96      115            1
5          80  2024-04-10 11:10:59.000000        0   22  22.4            13.0                       0.140              44        0      113            3
6          59  2024-04-10 11:10:59.000000        0   22  41.5            41.0                       0.173              64      142      105            0
7          38  2024-04-10 11:10:59.000000        1   27  38.2            42.0                       0.503              68        0       90            2
8          40  2024-04-10 11:10:59.000000        0   26  34.0            25.0                       0.271              64       70      180            3
9          15  2024-04-10 11:10:59.000000        1   32  30.0             0.0                       0.484               0        0      100            7