Use the build_time_series() method to build the dataset with start time and end time for feature values available in the feature catalog. Once the dataset is created, you can create a teradataml DataFrame on the dataset.
Required Parameters
- entity
- Specifies the name of the Entity or object Entity to be included in the dataset.
- selected_features
- Specifies the names of features and the corresponding feature version to be included in the dataset.
Key is the name of the feature and value is the version of the feature. Refer to FeatureCatalog.list_feature_versions() to get the list of features and their versions.
- view_name
- Specifies the name of the view to be created for the dataset.
Optional Parameters
- description
- Specifies the description for the dataset.
- include_historic_records
- Specifies whether to include historic data in the dataset.
Default value: False
Example setup
Ingest sales data to the feature catalog configured for repo 'vfs_v1'.
>>> from teradataml import load_example_data, FeatureProcess
>>> load_example_data('dataframe', 'sales')
>>> df = DataFrame("sales")
>>> df
Feb Jan Mar Apr datetime accounts Red Inc 200.0 150.0 140.0 NaN 04/01/2017 Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 Orange Inc 210.0 NaN NaN 250.0 04/01/2017 Yellow Inc 90.0 NaN NaN NaN 04/01/2017 Jones LLC 200.0 150.0 140.0 180.0 04/01/2017
Create a feature store.
>>> from teradataml import FeatureStore >>> fs = FeatureStore(repo='vfs_v1', data_domain='sales')
Repo vfs_v1 does not exist. Run FeatureStore.setup() to create the repo and setup FeatureStore.
Set up the feature store for this repository.
>>> fs.setup()
True
Initiate FeatureProcess to ingest features.
>>> fp = FeatureProcess(repo='vfs_v1', data_domain='sales', object=df, entity='accounts', features=['Jan', 'Feb', 'Mar', 'Apr'])
Run the feature process.
>>> fp.run()
Process 'a9f29a4e-3f75-11f0-b43b-f020ff57c62c' started. Process 'a9f29a4e-3f75-11f0-b43b-f020ff57c62c' completed.
Example 1: Build dataset with features 'Jan', 'Feb' from repo 'vfs_v1' and sales data domain
Name the dataset as 'ds_jan_feb'.
>>> from teradataml import DatasetCatalog
>>> dc = DatasetCatalog(repo='vfs_v1', data_domain='sales')
>>> dataset = dc.build_time_series(entity='accounts',
... selected_features = {
... 'Jan': 'a9f29a4e-3f75-11f0-b43b-f020ff57c62c',
... 'Feb': 'a9f29a4e-3f75-11f0-b43b-f020ff57c62c'},
... view_name='ds_jan_feb',
... description='Dataset with Jan and Feb features')
>>> dataset
accounts Jan Jan_start_time Jan_end_time Feb Feb_start_time Feb_end_time 0 Blue Inc 50.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 90.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 1 Red Inc 150.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 200.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 2 Yellow Inc NaN 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 90.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 3 Alpha Co 200.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 210.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 4 Jones LLC 150.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 200.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 5 Orange Inc NaN 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00: 210.0 2025-06-20 12:17:14.040000+00: 9999-12-31 23:59:59.999999+00:
Example 2: Build dataset with features 'f_int', 'f_float' from repo 'vfs_v1' and 'sales' data domain
Build a time series dataset by ingesting the same feature multiple times with updated values to show how the feature values are being transformed over a period of time.
>>> import time >>> from datetime import datetime as dt, date as d
Retrieve the record where accounts == 'Blue Inc'.
>>> df_test = df[df['accounts'] == 'Blue Inc'] >>> df_test
Feb Jan Mar Apr datetime accounts Blue Inc 90.0 50.0 95.0 101.0 04/01/2017
Writes record stored in a teradataml DataFrame to the database.
>>> df_test.to_sql('sales_test', if_exists='replace')
>>> test_df = DataFrame('sales_test')
>>> test_df
accounts Feb Jan Mar Apr datetime 0 Blue Inc 90.0 50 95 101 17/01/04
Create a feature process.
>>> fp = FeatureProcess(repo=repo, ... data_domain=data_domain, ... object=test_df, ... entity='accounts', ... features=['Jan', 'Feb'])
Run the feature process.
>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started. Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed. True
- Wait 20 seconds.
- Update the data.
- Run the feature process.
>>> time.sleep(20)
>>> execute_sql("update sales_test set Jan = Jan * 10, Feb = Feb * 10")
TeradataCursor uRowsHandle=269 bClosed=False
>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started. Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed. True
>>> time.sleep(20)
>>> execute_sql("update sales_test set Jan = Jan * 10, Feb = Feb * 10")
TeradataCursor uRowsHandle=397 bClosed=False
>>> fp.run()
Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' started. Process '6cb49b4b-79d4-11f0-8c5e-b0dcef8381ea' completed. True
Build the time series dataset with features 'Feb', 'Jan' by excluding the historic records from repo 'vfs_v1' and 'sales' data domain.
>>> dc = DatasetCatalog(repo='vfs_v1', data_domain='sales')
>>> exclude_history = dc.build_time_series(entity='accounts',
... selected_features={'Feb': fp.process_id,
... 'Jan': fp.process_id},
... view_name='exclude_history',
... include_historic_records=False)
>>> exclude_history
accounts Feb Feb_start_time Feb_end_time Jan Jan_start_time Jan_end_time 0 Blue Inc 9000.0 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 5000 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00:
>>> dc = DatasetCatalog(repo='vfs_v1', data_domain='sales')
>>> include_history = dc.build_time_series(entity='accounts',
... selected_features={'Feb': fp.process_id,
... 'Jan': fp.process_id},
... view_name='include_history',
... include_historic_records=True)
>>> include_history
accounts Feb Feb_start_time Feb_end_time Jan Jan_start_time Jan_end_time 0 Blue Inc 9000.0 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 5000 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 1 Blue Inc 90.0 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 50 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 2 Blue Inc 90.0 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 5000 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 3 Blue Inc 900.0 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: 500 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: 4 Blue Inc 900.0 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: 5000 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 5 Blue Inc 900.0 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: 50 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 6 Blue Inc 90.0 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 500 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: 7 Blue Inc 9000.0 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 50 2025-08-15 13:23:41.780000+00: 2025-08-15 13:24:31.320000+00: 8 Blue Inc 9000.0 2025-08-15 13:24:58.140000+00: 9999-12-31 23:59:59.999999+00: 500 2025-08-15 13:24:31.320000+00: 2025-08-15 13:24:58.140000+00: