PySpark API Supportability Matrix | RegexTokenizer Function | pyspark2teradataml - RegexTokenizer - Teradata Package for Python

Teradata® pyspark2teradataml User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2024-12-18
dita:mapPath
oeg1710443196055.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
oeg1710443196055
Product Category
Teradata Vantage

Arguments

This function uses Analytics Database function NGramSplitter through teradataml Analytics Database functions.

Transformed data will have extracted tokens in multiple rows for the output columns.

PySpark Argument Name Open Source Function Argument Name Notes
gaps overlapping  
inputCol text_column  
minTokenLength grams  
outputCol n_gram_column  
pattern delimiter Default value [\s]+
toLowercase to_lower_case  

Attributes/Methods

Attribute/Method Name Supported Notes
clear  
copy  
explainParam  
explainParams  
extractParamMap  
getGaps  
getInputCol  
getMinTokenLength  
getOrDefault  
getOutputCol  
getParam  
getPattern  
getToLowercase  
hasDefault  
hasParam  
isDefined  
isSet  
load  
read  
save  
set  
setGaps  
setInputCol  
setMinTokenLength  
setOutputCol  
setParams  
setPattern  
setToLowerCase  
transform  
write  
gaps  
inputCol  
minTokenLength  
outputCol  
params  
pattern  
toLowerCase