Function crosstab column names differ when special character appears.
For example, for the following DataFrame:
>>> df.show()
+---+-------+---------+--------------------+ | id|int_col|float_col| str_col| +---+-------+---------+--------------------+ | 1| 21| 21.2|Braund, Mr. Owen ...| | 2| 22| 22.6|Cumings, Mrs. Joh...| | 3| 23| 23.5|Palsson, Master. ...| +---+-------+---------+--------------------+
PySpark
>>> df.crosstab("id", "str_col").show()
+----------+-----------------------+---------------------------------------------------+------------------------------+ |id_str_col|Braund, Mr. Owen Harris|Cumings, Mrs. John Bradley (Florence Briggs Thayer)|Palsson, Master. Gosta Leonard| +----------+-----------------------+---------------------------------------------------+------------------------------+ | 3| 0| 0| 1| | 1| 1| 0| 0| | 2| 0| 1| 0| +----------+-----------------------+---------------------------------------------------+------------------------------+
teradatamlspk
>>> df.crosstab("id", "str_col").show()
+----------+---------------------------+--------------------+---------------------------------------------+ |id_str_col|palsson_master_gostaleonard|braund_mr_owenharris|cumings_mrs_johnbradley_florencebriggsthayer_| +----------+---------------------------+--------------------+---------------------------------------------+ | 2| 0| 0| 1| | 1| 0| 1| 0| | 3| 1| 0| 0| +----------+---------------------------+--------------------+---------------------------------------------+
All special characters in column names are converted to _ in teradatamlspk.