Example: Work with StandardScaler Function | teradatamlspk - Example: Work with StandardScaler Function - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

PySpark StandardScaler function and teradatamlspk StandardScaler Function work differently.

Assume the DataFrame has the following data.
>>> df.show()
+--------+--------+--------+
|feature1|feature2|feature3|
+--------+--------+--------+
|     1.0|     0.1|    -1.0|
|     2.0|     1.1|     1.0|
|     3.0|    10.1|     3.0|
+--------+--------+--------+

The following examples shows their different usage.

The differences include:

  • DataFrame accepted by teradatamlspk StandardScaler requires a column ‘id’ and it is created with the function monotonically_increasing_id.
  • PySpark returns a vector but teradatamlspk does not return a vector.
  • Column names for PySpark StandardScaler transform method follow argument outputCol.

    teradatamlspk StandardScaler transform method returns all columns as input DataFrame but the Columns mentioned in argument inputCol are scaled while the values of other columns remains same.