PySpark API Supportability Matrix | Data Frame APIs | pyspark2teradataml - DataFrame APIs - Teradata Package for Python

Teradata® pyspark2teradataml User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2026-01-07
dita:mapPath
oeg1710443196055.ditamap
dita:ditavalPath
zuq1752009390153.ditaval
dita:id
oeg1710443196055
Product Category
Teradata Vantage
PySpark API Name Supported Notes
alias  
count  
crossJoin If both DataFrames share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. The order of the columns also varies.
join If both DataFrames share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. The order of the columns also varies.
  • This is not applicable to "semi", "left_semi", "leftsemi", "anti", "leftanti", "left_anti" type of joins. So, the output matches with PySpark for these type of joins even though they both share same column names.
  • This is not applicable when on clause is a string or list of strings. So, the output matches with PySpark for these type of joins even though they both share same column names.
distinct  
columns  
dropDuplicates  
drop_duplicates  
dropna  
dtypes  
exceptAll  
intersect  
intersectAll  
limit  
subtract  
tail  
toPandas  
union  
unionAll  
toLocalIterator  
head  
filter  
where  
randomSplit Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
sample Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
withColumnRenamed  
withColumnsRenamed  
corr PySpark considers NULLS also while calculating the co-relation between 2 columns whereas teradatamlspk does not consider NULLS for calculating co-relation between 2 columns.
cov PySpark considers NULLS also while calculating the covariance between 2 columns whereas teradatamlspk does not consider NULLS for calculating covariance between 2 columns.
take  
select  
sort
  • API changes will not be propagated to next APIs.
  • ColumnExpressions are not supported. Only Column names are supported.
orderBy
  • API changes will not be propagated to next APIs.
  • ColumnExpressions are not supported. Only Column names are supported.
first  
unionByName  
cache teradatamlspk will return same DataFrame.
checkpoint teradatamlspk will return same DataFrame.
localCheckpoint teradatamlspk will return same DataFrame.
persist API returns the persisted DataFrame as with the Teradata Vantage, data resides in Database.
unpersist teradatamlspk will return same DataFrame.
collect  
schema nullable parameter in StructField always shows True.
toDF  
summary  
describe  
colRegex Pyspark returns result based on Scala or Java regex, whereas teradatamlspk will return based on python regex.
isEmpty  
show  
unpivot Output DataFrame column names may vary when compared to PySpark DataFrame columns.
melt Output DataFrame column names may vary when compared to PySpark DataFrame columns.
createGlobalTempView  
createOrReplaceTempView  
createTempView
createOrReplaceTempView  
registerTempTable  
sortWithinPartitions Functionality is not applicable for Vantage . Hence, teradatamlspk returns a sorted DataFrame based on columns.
hint Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
coalesce Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartition Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartitionByRange Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
sameSemantics Functionality is not applicable for Vantage . Hence, teradatamlspk returns False always.
semanticHash Functionality is not applicable for Vantage . Hence, teradatamlspk returns 0.
inputFiles Functionality is not applicable for Vantage . Hence, teradatamlspk returns empty list.
selectExpr Column names may vary when compared to PySpark.
drop  
isLocal Functionality is not applicable for Vantage . Hence, teradatamlspk returns False.
isStreaming Functionality is not applicable for Vantage . Hence, teradatamlspk returns False.
printSchema  
replace  
crosstab Column names and order of columns may vary based on the data in DataFrame.
foreach  
foreachPartition  
cube Output DataFrame Column names may vary when compared to PySpark DataFrame Columns.
rollup Output DataFrame Column names may vary when compared to PySpark DataFrame Columns.
fillna  
transform  
groupBy  
agg  
__getattr__  
__getitem__  
na  
sampleBy  
stat  
withColumn  
withColumns  
approxQuantile relativeError argument is ignored and treated as zero, returning the exact quantiles.
DataFrameNaFunctions.drop  
DataFrameNaFunctions.fill  
DataFrameNaFunctions.replace  
DataFrameStatFunctions.corr  
DataFrameStatFunctions.cov  
DataFrameStatFunctions.crosstab  
DataFrameStatFunctions.sampleBy  
DataFrameStatFunctions.approxQuantile relativeError argument is ignored and treated as zero, returning the exact quantiles.