If both the DataFrames share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. Also the order of the columns varies.
join
If both the DataFrame's share similar column names, then teradatamlspk shows the column names with prefix as “l” and “r”. Also the order of the columns varies.
This is not applicable to "semi", "left_semi", "leftsemi", "anti", "leftanti", "left_anti" type of joins. So, the output matches with PySpark for these type of joins even though they both share same column names.
This is not applicable when on clause is a string or list of strings. So, the output matches with PySpark for these type of joins even though they both share same column names.
distinct
columns
distinct
dropDuplicates
drop_duplicates
dropna
dtypes
Output shows Teradata types, not PySpark types.
exceptAll
intersect
intersectAll
limit
subtract
tail
toPandas
union
unionAll
toLocalIterator
head
filter
where
randomSplit
Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
sample
Argument seed is ignored in teradatamlspk. While you can specify it, it will not be used in processing.
withColumnRenamed
withColumnsRenamed
corr
PySpark considers NULLS also while calculating the co-relation between 2 columns whereas teradatamlspk does not consider NULLS for calculating co-relation between 2 columns.
cov
PySpark considers NULLS also while calculating the covariance between 2 columns whereas teradatamlspk does not consider NULLS for calculating covariance between 2 columns.
take
select
sort
API changes will not be propagated to next APIs.
ColumnExpressions are not supported. Only Column names are supported.
orderBy
API changes will not be propagated to next APIs.
ColumnExpressions are not supported. Only Column names are supported.
first
unionByName
cache
teradatamlspk will return same DataFrame.
checkpoint
teradatamlspk will return same DataFrame.
localCheckpoint
teradatamlspk will return same DataFrame.
persist
teradatamlspk will return same DataFrame.
unpersist
teradatamlspk will return same DataFrame.
collect
schema
nullable parameter in StructField always shows True.
toDF
summary
describe
colRegex
Pyspark returns result based on Scala or Java regex, whereas teradatamlspk will return based on python regex.
isEmpty
show
unpivot
Output DataFrame column names may vary when compared to PySpark DataFrame columns.
melt
Output DataFrame column names may vary when compared to PySpark DataFrame columns.
createGlobalTempView
createOrReplaceTempView
createTempView
No concept of temporary view. teradatamlspk creates a view; drop the view at end of session.
createOrReplaceTempView
No concept of temporary view. teradatamlspk creates a view; drop the view at end of session.
registerTempTable
sortWithinPartitions
Functionality is not applicable for Vantage . Hence, teradatamlspk returns a sorted DataFrame based on columns.
hint
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
coalesce
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartition
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
repartitionByRange
Functionality is not applicable for Vantage . Hence, teradatamlspk returns same DataFrame.
sameSemantics
Functionality is not applicable for Vantage . Hence teradatamlspk returns False always.
semanticHash
Functionality is not applicable for Vantage . Hence teradatamlspk returns 0.
inputFiles
Functionality is not applicable for Vantage . Hence teradatamlspk returns empty list.
selectExpr
Column names may vary when compared to PySpark.
drop
isLocal
Functionality is not applicable for Vantage . Hence teradatamlspk returns False.
isStreaming
Functionality is not applicable for Vantage . Hence teradatamlspk returns False.
printSchema
replace
PySpark ignores the replacement when it is not possible instead of raising error. If you replace a numeric column with a string type, PySpark ignores the replacement but teradatamlspk raises an error.
crosstab
Column names may vary based on the data in DataFrame. Order of columns might also vary.
foreach
foreachPartition
cube
Pyspark performs aggregation on columns used for grouping where as teradatamlspk ignores the aggregation of grouping columns.
rollup
PySpark performs aggregation on columns used for grouping where as teradatamlspk ignores the aggregation of grouping columns.
fillna
All input arguments must be of the same data type or their types must be compatible. For example, if value is an integer type, and subset contains a string column, PySpark ignores the replacement but, teradatamlspk raises an error. You must drop incompatible columns or cast them to the compatible ones.
transform
groupBy
agg
Functions count_distinct and countDistinct only accepts one column as input.
__getattr__
__getitem__
na
sampleBy
stat
withColumn
withColumns
DataFrameNaFunctions.drop
DataFrameNaFunctions.fill
All input arguments must be of the same data type or their types must be compatible. For example, if the value is an integer type, and subset contains a string column, PySpark ignores the replacement but, teradatamlspk raises an error. You must drop incompatible columns or cast them to the compatible ones.