pivot | teradatamlspk | pyspark2teradataml - pivot - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

When using pivot, the output column names are different in PySpark and teradatamlspk.

PySpark

>>> df.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("earnings").show()
+----+------+-----+
|year|dotNET| Java|
+----+------+-----+
|2012| 15000|20000|
|2013| 48000|30000|
+----+------+-----+

teradatamlspk

>>> df.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("earnings").show()
+----+-------------------+-----------------+
|year|sum_earnings_dotnet|sum_earnings_java|
+----+-------------------+-----------------+
|2012|              15000|            20000|
|2013|              48000|            30000|
+----+-------------------+-----------------+

When using pivot in teradatamlspk, the grouping columns are not returned.

PySpark

>>> df1.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("year").show()
+----+------+----+
|year|dotNET|Java|
+----+------+----+
|2012|  4024|2012|
|2013|  2013|2013|
+----+------+----+

teradatamlspk

>>> df1.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("year").show()
+---------------+-------------+
|sum_year_dotnet|sum_year_java|
+---------------+-------------+
|           6037|         4025|
+---------------+-------------+