Combine Feature Groups into a single FeatureGroup | Teradata Package for Python - Combining one or more Feature Groups into a single FeatureGroup - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

You can combine multiple FeatureGroup objects using the + operator. Combining multiple FeatureGroup objects creates a new FeatureGroup object.

Prerequisites for combining multiple Feature Groups:
  • Entities must be identical.
  • If the Data Source includes timestamp column details, then both data sources should have same timestamp column names..

Let’s look at an example to see how this works.

Create two different FeatureGroups

Before creating two FeatureGroups, let’s look at two different data sets: patient_profile and medical_readings.

Patient Profile

>>> load_example_data('dataframe', 'patient_profile')
>>> patient_profile_df = DataFrame('patient_profile')
>>> patient_profile_df
                      record_timestamp  pregnancies  age   bmi  skin_thickness
patient_id                                                                    
17          2024-04-10 11:10:59.000000            7   31  29.6             0.0
34          2024-04-10 11:10:59.000000           10   45  27.6            31.0
13          2024-04-10 11:10:59.000000            1   59  30.1            23.0
53          2024-04-10 11:10:59.000000            8   58  33.7            34.0
11          2024-04-10 11:10:59.000000           10   34  38.0             0.0
51          2024-04-10 11:10:59.000000            1   26  24.2            15.0
32          2024-04-10 11:10:59.000000            3   22  24.8            11.0
15          2024-04-10 11:10:59.000000            7   32  30.0             0.0
99          2024-04-10 11:10:59.000000            1   31  49.7            51.0
0           2024-04-10 11:10:59.000000            6   50  33.6            35.0

Medical Readings

>>> load_example_data('dataframe', 'medical_readings')
>>> medical_readings_df = DataFrame('medical_readings')
>>> medical_readings_df
                      record_timestamp  glucose  blood_pressure  insulin  diabetes_pedigree_function  outcome
patient_id                                                                                                   
17          2024-04-10 11:10:59.000000      107              74        0                       0.254        1
34          2024-04-10 11:10:59.000000      122              78        0                       0.512        0
13          2024-04-10 11:10:59.000000      189              60      846                       0.398        1
53          2024-04-10 11:10:59.000000      176              90      300                       0.467        1
11          2024-04-10 11:10:59.000000      168              74        0                       0.537        1
51          2024-04-10 11:10:59.000000      101              50       36                       0.526        0
32          2024-04-10 11:10:59.000000       88              58       54                       0.267        0
15          2024-04-10 11:10:59.000000      100               0        0                       0.484        1
99          2024-04-10 11:10:59.000000      122              90      220                       0.325        1
0           2024-04-10 11:10:59.000000      148              72        0                       0.627        1
>>>

Create two FeatureGroups for the two datasets

Let's first create individual FeatureGroups.

>>> patient_profile_fg = FeatureGroup.from_DataFrame(
...    name='PatientProfile', 
...    df=patient_profile_df, 
...    entity_columns='patient_id', 
...    timestamp_col_name='record_timestamp'
)
>>> medical_readings_fg = FeatureGroup.from_DataFrame(
...    name='MedicalReadings', 
...    df=medical_readings_df, 
...    entity_columns='patient_id', 
...    timestamp_col_name='record_timestamp'
)
>>> print(patient_profile_fg.features)
[Feature(name=pregnancies), Feature(name=age), Feature(name=bmi), Feature(name=skin_thickness)]
>>> print(medical_readings_fg.features)
[Feature(name=glucose), Feature(name=blood_pressure), Feature(name=insulin), Feature(name=diabetes_pedigree_function), Feature(name=outcome)] 

Combine the two FeatureGroups

>>> new_fg = patient_profile_fg + medical_readings_fg 

Examine the combined FeatureGroup properties

>>> print(new_fg.name)
'PatientProfile_MedicalReadings'
>>> print(new_fg.features)
[Feature(name=pregnancies), Feature(name=age), Feature(name=bmi), Feature(name=skin_thickness), Feature(name=glucose), Feature(name=blood_pressure), Feature(name=insulin), Feature(name=diabetes_pedigree_function), Feature(name=outcome)]
>>> print(new_fg.entity)
Entity(name=PatientProfile_MedicalReadings)
>>> print(new_fg.data_source)
DataSource(name=PatientProfile_MedicalReadings)