TD_ColumnTransformer Example | ColumnTransformer - TD_ColumnTransformer Example - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

TD_ColumnTransformer Input Table: titanic_train

PassengerID Pclass Name Gender Age SibSp Parch Fare Cabin Embarked
149 2 Navratil, Michael M 36 0 2 26.0 B21 S
152 1 Pearson, Mrs. Thomas F Null 1 0 66.6 C2 S
581 2 Christian, Miss Juliana F 25 1 1 30.0 Null S
663 1 Collier, Dr. Edwin M 47 0 0 25.70 A23 S
704 3 Gavin, Mr. Herbert M 25 0 0 7.74 Null Q

Create getCabin Input table

drop table getSubtitles;
create multiset table getSubtitles as (
select * from Unpack(
on titanic_train
Using
TargetColumn('Name')
OutputColumns('NTitle')
OutputDatatypes('Varchar')
Delimiter('$')
Regex('([A-Za-z]+)\.')
)as dt)with data;

drop table getCabin;
create multiset table getCabin as (
SELECT * FROM TD_strApply (
ON getSubtitles as inputtable
USING
TargetColumns ('cabin')
StringOperation('getNchars')
StringLength(1)
Accumulate('[:]','-cabin')
InPlace('True')
) as dt)with data;

TD_ColumnTransformer SQL Call

SELECT * FROM TD_ColumnTransformer(
ON getCabin AS inputtable
ON imputeFit AS SimpleImputeFitTable dimension
ON NonLinearCombineFit AS NonLinearCombineFitTable dimension
ON ordinalFit AS OrdinalEncodingFitTable dimension
ON onehotfittable AS OneHotEncodingFitTable dimension
ON ScaleFit AS ScaleFitTable dimension
)AS dt ORDER BY 1,2,3,4,5,6,7;

TD_ColumnTransformer Output

     NTitle    passenger     survived       pclass          gender       age        sibsp        parch  ticket                                  fare     embarked  cabin              FamilySize      cabin_A      cabin_B      cabin_C  cabin_other
-----------  -----------  -----------  -----------  -----------  -----------  -----------  -----------  --------------------  ----------------------  -----------  -----  ----------------------  -----------  -----------  -----------  -----------
         -1            8            0            3            1            2            3            1  349909                 4.11356604308324E-002            2  ?       5.00000000000000E 000            0            0            0            1
         -1           17            0            3            1            2            4            1  382652                 5.68482139999047E-002            1  ?       6.00000000000000E 000            0            0            0            1
		 ....	......			....		....			....		....			....			......					....							....			....		....		....			....		....		....
		 ....	......			....		....			....		....			....			......					....							....			....		....		....			....		....		....	
		 
          5          888            1            1            2           19            0            0  112053                 5.85561002574126E-002            2  B       1.00000000000000E 000            0            1            0            0
          5          889            0            3            2           28            1            2  W./C. 6607             4.57713517012109E-002            2  ?       4.00000000000000E 000            0            0            0            1
Comparison of serial processing of the functions to TD_ColumnTransformer function based on size of data set:
Data Set Serial Processing in Seconds TD_ColumnTransformer Processing in Seconds
10M 89 29
20M 167 49
30M 332 98