Example: Work with StandardScaler Function | teradatamlspk - Example: Work with StandardScaler Function - Teradata Package for Python

Teradata® pyspark2teradataml User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2024-12-18
dita:mapPath
oeg1710443196055.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
oeg1710443196055
Product Category
Teradata Vantage

PySpark StandardScaler function and teradatamlspk StandardScaler Function work differently.

Assume the DataFrame has the following data.
>>> df.show()
+--------+--------+--------+
|feature1|feature2|feature3|
+--------+--------+--------+
|     1.0|     0.1|    -1.0|
|     2.0|     1.1|     1.0|
|     3.0|    10.1|     3.0|
+--------+--------+--------+

The following examples shows their different usage.

The differences include:

  • DataFrame accepted by teradatamlspk StandardScaler requires a column ‘id’ and it is created with the function monotonically_increasing_id.
  • PySpark returns a vector but teradatamlspk does not return a vector.
  • Column names for PySpark StandardScaler transform method follow argument outputCol.

    teradatamlspk StandardScaler transform method returns all columns as input DataFrame but the Columns mentioned in argument inputCol are scaled while the values of other columns remains same.