Introduction to Teradata pyspark2teradataml - Introduction to Teradata pyspark2teradataml - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

teradatamlspk is the Python package name of Teradata product pyspark2teradataml. The teradatamlspk package is built as an extension of teradataml - a Teradata Package for Python.

Syntax and user accessibility of the teradatamlspk APIs are kept similar to PySpark APIs. This allows the existing PySpark workloads that runs on Spark engine to easily run on Teradata Vantage using ClearScape Analytics with minimal changes to the PySpark workloads.

teradatamlspk offers a function called pyspark2teradataml that enables conversion of a PySpark script to a teradatamlspk Python script. This function also generates HTML report for the conversion, which is useful for users to understand the changes done and carry out any manual changes in the generated teradatamlspk script, so that the script can be run on Vantage.