If both DataFrames share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. The order of the columns also varies.
join
If both DataFrames share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. The order of the columns also varies.
This is not applicable to "semi", "left_semi", "leftsemi", "anti", "leftanti", "left_anti" type of joins. So, the output matches with PySpark for these type of joins even though they both share same column names.
This is not applicable when on clause is a string or list of strings. So, the output matches with PySpark for these type of joins even though they both share same column names.
distinct
columns
dropDuplicates
drop_duplicates
dropna
dtypes
exceptAll
intersect
intersectAll
limit
subtract
tail
toPandas
union
unionAll
toLocalIterator
head
filter
where
randomSplit
Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
sample
Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
withColumnRenamed
withColumnsRenamed
corr
PySpark considers NULLS also while calculating the co-relation between 2 columns whereas teradatamlspk does not consider NULLS for calculating co-relation between 2 columns.
cov
PySpark considers NULLS also while calculating the covariance between 2 columns whereas teradatamlspk does not consider NULLS for calculating covariance between 2 columns.
take
select
sort
API changes will not be propagated to next APIs.
ColumnExpressions are not supported. Only Column names are supported.
orderBy
API changes will not be propagated to next APIs.
ColumnExpressions are not supported. Only Column names are supported.
first
unionByName
cache
teradatamlspk will return same DataFrame.
checkpoint
teradatamlspk will return same DataFrame.
localCheckpoint
teradatamlspk will return same DataFrame.
persist
API returns the persisted DataFrame as with the Teradata Vantage, data resides in Database.
unpersist
teradatamlspk will return same DataFrame.
collect
schema
nullable parameter in StructField always shows True.
toDF
summary
describe
colRegex
Pyspark returns result based on Scala or Java regex, whereas teradatamlspk will return based on python regex.
isEmpty
show
unpivot
Output DataFrame column names may vary when compared to PySpark DataFrame columns.
melt
Output DataFrame column names may vary when compared to PySpark DataFrame columns.
createGlobalTempView
createOrReplaceTempView
createTempView
createOrReplaceTempView
registerTempTable
sortWithinPartitions
Functionality is not applicable for Vantage . Hence, teradatamlspk returns a sorted DataFrame based on columns.
hint
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
coalesce
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartition
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartitionByRange
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
sameSemantics
Functionality is not applicable for Vantage . Hence, teradatamlspk returns False always.
semanticHash
Functionality is not applicable for Vantage . Hence, teradatamlspk returns 0.
inputFiles
Functionality is not applicable for Vantage . Hence, teradatamlspk returns empty list.
selectExpr
Column names may vary when compared to PySpark.
drop
isLocal
Functionality is not applicable for Vantage . Hence, teradatamlspk returns False.
isStreaming
Functionality is not applicable for Vantage . Hence, teradatamlspk returns False.
printSchema
replace
crosstab
Column names and order of columns may vary based on the data in DataFrame.
foreach
foreachPartition
cube
Output DataFrame Column names may vary when compared to PySpark DataFrame Columns.
rollup
Output DataFrame Column names may vary when compared to PySpark DataFrame Columns.
fillna
transform
groupBy
agg
__getattr__
__getitem__
na
sampleBy
stat
withColumn
withColumns
approxQuantile
relativeError argument is ignored and treated as zero, returning the exact quantiles.
DataFrameNaFunctions.drop
DataFrameNaFunctions.fill
DataFrameNaFunctions.replace
DataFrameStatFunctions.corr
DataFrameStatFunctions.cov
DataFrameStatFunctions.crosstab
DataFrameStatFunctions.sampleBy
DataFrameStatFunctions.approxQuantile
relativeError argument is ignored and treated as zero, returning the exact quantiles.