Key Feature Additions and Changes | Teradata pyspark2teradataml Package - Key Feature Additions and Changes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
The following table lists the key feature additions and changes in the Teradata product pyspark2teradataml.
Date Release Description
August 2024 20.00.00.01
  • teradatamlspk DataFrame
    • write() - Supports writing the DataFrame to local file system or to Vantage or to cloud storage.
    • writeTo() - Supports writing the DataFrame to a Vantage table.
    • rdd - Returns the same DataFrame.
  • teradatamlspk DataFrameColumn (ColumnExpression)
    • desc_nulls_first - Returns a sort expression based on the descending order of the given column name, and null values appear before non-null values.
    • desc_nulls_last - Returns a sort expression based on the descending order of the given column name, and null values appear after non-null values.
    • asc_nulls_first - Returns a sort expression based on the ascending order of the given column name, and null values appear before non-null values.
    • asc_nulls_last - Returns a sort expression based on the ascending order of the given column name, and null values appear after non-null values.
  • Updates
    • DataFrame.fillna() and DataFrame.na.fill() now supports input arguments of the same data type or their types must be compatible.
    • DataFrame.agg() and GroupedData.agg() function supports Column as input and '*' for 'count'.
    • DataFrameColumn.cast() and DataFrameColumn.alias() now accepts string literal which are case insensitive.
    • Optimized performance for DataFrame.show()
    • Classification Summary, TrainingSummary object and MulticlassClassificationEvaluator now supports weightedTruePositiveRate and weightedFalsePositiveRate metric.
    • Arithmetic operations can be performed on window aggregates.
    • Added new function time_difference to return difference between two timestamps in seconds.
  • Bug fixes:
    • DataFrame.head() returns a list when n is 1.
    • DataFrame.union() and DataFrame.unionAll() now performs union of rows based on columns position.
    • DataFrame.groupBy() and DataFrame.groupby() now accepts columns as positional arguments as well, for example df.groupBy("col1", "col2").
    • MLlib Functions attribute numClasses and intercept now return value.
    • Appropriate error is raised if invalid file is passed to pyspark2teradataml.
    • when function accepts Column also along with literal for value argument.
March 2024 20.00.00.00 Initial release.
  • A pyspark2teradataml utility function to enable PySpark script conversion automatically to teradataml format.
  • Supports the following:
    • 85 DataFrame APIs with similar syntax compared to PySpark DataFrame APIs.
    • 22 DataFrameColumn APIs with similar syntax compared to PySpark DataFrameColumn APIs.
    • 200 Functions with similar syntax compared to PySpark Functions.
    • 69 machine learning functions with similar syntax compared to PySpark machine learning functions.