PySpark StandardScaler function and teradatamlspk StandardScaler Function work differently.
Assume the DataFrame has the following data.
>>> df.show()
+--------+--------+--------+ |feature1|feature2|feature3| +--------+--------+--------+ | 1.0| 0.1| -1.0| | 2.0| 1.1| 1.0| | 3.0| 10.1| 3.0| +--------+--------+--------+
The following examples shows their different usage.
The differences include:
- DataFrame accepted by teradatamlspk StandardScaler requires a column ‘id’ and it is created with the function monotonically_increasing_id.
- PySpark returns a vector but teradatamlspk does not return a vector.
- Column names for PySpark StandardScaler transform method follow argument outputCol.
teradatamlspk StandardScaler transform method returns all columns as input DataFrame but the Columns mentioned in argument inputCol are scaled while the values of other columns remains same.