get_data() | FeatureStore Get Method | Teradata Package for Python - get_data() - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
Use the get_data() method to return teradataml DataFrame that has entities and feature values. This function generates dataset from the following:
  • process_id
  • entity and features
  • dataset_name

Optional Parameters

process_id
Either process_id, entity and features, dataset_name is mandatory.

Specifies the process id of an existing feature process.

entity
Specifies the name of the entity or object Entity to be included in the dataset.
features
Specifies the names of Features and the corresponding feature version to be included in the dataset.

Key is the name of the feature and value is the version of the feature.

Refer to FeatureCatalog.list_feature_versions() to get the list of features and their versions.

dataset_name
Specifies the dataset name.
as_of
Specifies the time to retrieve the Feature Values instead of retrieving the latest values.
  • Applicable only when process_id is passed to the function.
  • Ignored when dataset_name is passed.
include_historic_records
Specifies whether to include historic data in the dataset.

If "as_of" is specified, then the "include_historic_records" argument is ignored.

Default value: False.

Example setup

>>> from teradataml import DataFrame, FeatureStore, load_example_data

Create DataFrame on sales data.

>>> load_example_data("dataframe", "sales")
>>> df = DataFrame("sales")
>>> df
              Feb    Jan    Mar    Apr    datetime
accounts
Orange Inc  210.0    NaN    NaN  250.0  04/01/2017
Jones LLC   200.0  150.0  140.0  180.0  04/01/2017
Blue Inc     90.0   50.0   95.0  101.0  04/01/2017
Alpha Co    210.0  200.0  215.0  250.0  04/01/2017
Yellow Inc   90.0    NaN    NaN    NaN  04/01/2017

Create FeatureStore 'vfs_v1' or use existing one.

>>> repo = 'vfs_v1'
>>> data_domain = 'sales'
>>> fs = FeatureStore(repo=repo, data_domain=data_domain)
FeatureStore is ready to use.

Example 1: Get the data from process_id

Create a feature process.

>>> fp = FeatureProcess(repo=repo,
...                     data_domain=data_domain,
...                     object=df,
...                     entity='accounts',
...                     features=['Jan', 'Feb'])
>>> fp.run()
Process '1e9e8d64-6851-11f0-99c5-a30631e77953' started.
Process '1e9e8d64-6851-11f0-99c5-a30631e77953' completed.
True

Get data from FeatureStore.

>>> fs.get_data(process_id=fp.process_id)
     accounts    Feb    Jan
0    Alpha Co  210.0  200.0
1    Blue Inc   90.0   50.0
2   Jones LLC  200.0  150.0
3  Orange Inc  210.0    NaN
4  Yellow Inc   90.0    NaN
5     Red Inc  200.0  150.0

Example 2: Get the data from entity and features

>>> fs.get_data(entity='accounts', features={'Jan': fp.process_id})
     accounts    Jan
0    Alpha Co  200.0
1    Blue Inc   50.0
2   Jones LLC  150.0
3  Orange Inc    NaN
4  Yellow Inc    NaN
5     Red Inc  150.0

Example 3: Get the data from dataset name

Build the dataset.

>>> dc = DatasetCatalog(repo=repo, data_domain=data_domain)
>>> dc.build_dataset(entity='accounts',
...                  selected_features={'Jan': fp.process_id,
...                                     'Feb': fp.process_id},
...                  view_name='test_get_data',
...                  description='Dataset with Jan and Feb')

Get data from the dataset.

>>> fs.get_data(dataset_name='test_get_data')
     accounts    Feb    Jan
0    Alpha Co  210.0  200.0
1    Blue Inc   90.0   50.0
2   Jones LLC  200.0  150.0
3  Orange Inc  210.0    NaN
4  Yellow Inc   90.0    NaN
5     Red Inc  200.0  150.0

Example 4: Get the data from Entity and Features, where entity object and feature objects passed to the entity and features arguments

Create features.

>>> feature1 = Feature('sales:Mar',
...                    df.Mar,
...                    feature_type=FeatureType.CATEGORICAL)
 
>>> feature2 = Feature('sales:Apr',
...                    df.Apr,
...                    feature_type=FeatureType.CONTINUOUS)

Create an entity.

>>> entity = Entity(name='accounts_entity', columns=['accounts'])

Create a feature process.

>>> fp1 = FeatureProcess(repo=repo,
...                      data_domain=data_domain,
...                      object=df,
...                      entity=entity,
...                      features=[feature1, feature2])
>>> fp1.run()
Process '5522c034-684d-11f0-99c5-a30631e77953' started.
Process '5522c034-684d-11f0-99c5-a30631e77953' completed.
True

Get data from the entity and features.

>>> fs.get_data(entity=entity, features={feature1.name: fp1.process_id,
...                                      feature2.name: fp1.process_id})
     accounts  sales:Mar  sales:Apr
0    Alpha Co      215.0      250.0
1    Blue Inc       95.0      101.0
2   Jones LLC      140.0      180.0
3  Orange Inc        NaN      250.0
4  Yellow Inc        NaN        NaN
5     Red Inc      140.0        NaN

Example 5: Get the data for the time passed by the user via the as_of argument

Import required packages.

>>> import time
>>> from datetime import datetime as dt, date as d

Retrieve the record where accounts == 'Blue Inc'.

>>> df_test = df[df['accounts'] == 'Blue Inc']
>>> df_test
              Feb    Jan    Mar    Apr    datetime
accounts
Blue Inc     90.0   50.0   95.0  101.0  04/01/2017
>>> df_test.to_sql('sales_test', if_exists='replace')
>>> test_df = DataFrame('sales_test')
>>> test_df
   accounts   Feb  Jan  Mar  Apr  datetime
0  Blue Inc  90.0   50   95  101  17/01/04

Create a feature process.

>>> fp = FeatureProcess(repo=repo,
...                     data_domain=data_domain,
...                     object=test_df,
...                     entity='accounts',
...                     features=['Jan', 'Feb'])

Run the feature process.

>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started.
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed.
True

This example runs the same process more than once to demonstrate how you can retrieve a specific version of Features using argument 'as_of'.

Wait for 20 seconds, update the data, then run again.

>>> time.sleep(20)
>>> execute_sql("update sales_test set Jan = Jan * 10, Feb = Feb * 10")
TeradataCursor uRowsHandle=269 bClosed=False

Run the feature process again.

>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started.
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed.
True

Wait again for 20 seconds, update the data, then run again.

>>> time.sleep(20)
>>> execute_sql("update sales_test set Jan = Jan * 10, Feb = Feb * 10")
TeradataCursor uRowsHandle=397 bClosed=False

Run the feature process again.

>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started.
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed.
True

Retrieve specific version of Features at '2025-08-15 12:37:23'. The time passed to as_of is in datetime.datetime format.

>>> as_of_time = dt(2025, 8, 15, 12, 37, 23)
>>> fs.get_data(process_id=fp.process_id,
...             as_of=as_of_time)
   accounts    Feb  Jan
0  Blue Inc  900.0  500
>>> fs.get_data(process_id=fp.process_id,
...             as_of=as_of_time.strftime('%Y-%m-%d %H:%M:%S'))
   accounts    Feb  Jan
0  Blue Inc  900.0  500

Example 6: Get the data for the time passed by the user via the as_of argument by sourcing entity and features

Time is passed to the as_of argument in datetime.datetime format.

>>> fs.get_data(entity='accounts',
...             features={'Feb': fp.process_id,
...                       'Jan': fp.process_id},
...             as_of=as_of_time)
   accounts    Feb  Jan
0  Blue Inc  900.0  500

Time is passed to the as_of argument in string format.

>>> fs.get_data(entity='accounts',
...             features={'Feb': fp.process_id,
...                       'Jan': fp.process_id},
...             as_of=as_of_time.strftime('%Y-%m-%d %H:%M:%S'))
   accounts    Feb  Jan
0  Blue Inc  900.0  500

Example 7: Get the latest data for the given process_id

>>> fs.get_data(process_id=fp.process_id, include_historic_records=False)
   accounts     Feb   Jan
0  Blue Inc  9000.0  5000

Example 8: Get the historic data for the given process_id

>>> fs.get_data(process_id=fp.process_id, include_historic_records=True)
   accounts     Feb   Jan
0  Blue Inc  9000.0  5000
1  Blue Inc    90.0    50
2  Blue Inc    90.0  5000
3  Blue Inc   900.0   500
4  Blue Inc   900.0  5000
5  Blue Inc   900.0    50
6  Blue Inc    90.0   500
7  Blue Inc  9000.0    50
8  Blue Inc  9000.0   500

Example 9: Get the latest data for the given feature

>>> fs.get_data(entity='accounts', features={'Feb': fp.process_id}, include_historic_records=False)
   accounts     Feb
0  Blue Inc  9000.0

Example 10: Get the historic data for the given feature

>>> fs.get_data(entity='accounts', features={'Feb': fp.process_id}, include_historic_records=True)
   accounts     Feb
0  Blue Inc   900.0
1  Blue Inc    90.0
2  Blue Inc  9000.0