Feature Store Concepts - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Feature

Definition A measurable property or characteristic of the input dataset that serves as an input to machine learning functions.
Teradata Component teradataml.Feature, a class that represents Feature
Examples
  • total_purchase_amount in a shopping cart
  • heart_rate for a patient
  • credit_score of a customer

Feature Value

Definition The actual value of a feature for a specific instance or data point.
Teradata Context Feature Values are the actual data stored in feature columns within temporal tables.
Examples
  • For feature "house_price": feature value might be 450000
  • For feature "customer_age": feature value might be 35
  • For feature "transaction_type": feature value might be "credit"

Entity

Definition A collection of semantically related features that represents a business object.
Teradata Component teradataml.Entity, a class that represents Entity.
Examples
  • customer_id of a retail store
  • area_code and state_code for postal code

Data Source

Definition The source of data that feeds into the ML model and feature store.
Teradata Context Data Source can be a table, view, or any SQL query that provides data for feature extraction.
Teradata Component teradataml.DataSource, a class that represents DataSource.
Types of Data Sources
  • Tables: Direct table references
  • SQL queries: Custom analytical queries
  • DataFrames: teradataml DataFrame objects

Feature Group

Definition A logical collection of features that share the same entity and are stored in the same source, enabling organized feature management and reusability.
Teradata Context Feature Group acts as a logical representation that groups related features together for easier management and ensures point-in-time correctness.
Teradata Component teradataml.FeatureGroup, a class that represents FeatureGroup.

Feature Catalog

Definition A centralized registry that organizes and tracks all feature values available for ML workflows.
Teradata Component teradataml.FeatureCatalog, a class that represents Feature catalog.
Catalog Capabilities
  • Historical tracking: Maintains feature value history
  • Version management: Tracks feature evolution over time
  • Metadata storage: Stores feature descriptions and lineage

Feature Process

Definition A workflow process, identified by a unique process ID, that computes feature values from data sources and ingests them into the feature catalog.
Teradata Context Feature Process ingests the feature values from an ETL pipeline and stores them in the feature catalog.

The ETL pipeline is built using teradataml DataFrame operations, which are stored by creating a view on the final DataFrame.

Teradata Component teradataml.FeatureProcess, a class that represents Feature process.

Process Catalog

Definition A centralized registry that stores details of feature processes executed over time to provide audit trail and process management capabilities.
Teradata Context Process Catalog is a table that maintains the history of all feature processing workflows.

Dataset

Definition A structured collection of data created by selecting and organizing specific feature values related to an entity, serving as a snapshot for analysis or model training.
Teradata Context Dataset is created by joining different feature values with corresponding entities, providing a consolidated view for ML workflows.
Teradata Component teradataml.Dataset, a class that represents dataset.

Dataset Catalog

Definition A centralized repository that provides metadata and details about datasets created within the feature store, enabling dataset discovery and management.
Teradata Component teradataml.DatasetCatalog, a class that represents dataset catalogs.

Repository (Repo)

Definition A physical namespace that isolates and organizes feature definitions, entities, feature groups, and catalogs, allowing multiple teams to work independently.
Teradata Context Repository is implemented as a database or schema that contains all feature store objects.

Data Domain

Definition A logical grouping inside a repository that organizes data, entities, and features within a specific business context, preventing naming conflicts and ensuring clarity.
Teradata Context Data Domain acts as a namespace within a repository to organize features by business area or use case.
Examples Banking System
  • customer_Domain: Stores information related to customer personal and demographic data
  • transaction_Domain: Stores information about financial transactions conducted through customer accounts
  • risk_Domain: Stores information related to credit score history and loan payments for risk assessment