Teradata Package for Python Function Reference | 20.00 - ngram - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.ngram = ngram(expression, length, position=1)
- DESCRIPTION:
Function returns the number of n-gram matches between values in column
and "expression". A high number of matching n-gram patterns implies
a high similarity between the two strings.
For positional n-gram matching, the position as well as the pattern must match
when measuring similarity. The position value indicates how far away positionally
the match may be between the 2 strings as follows:
* If position is set to a value of zero, the match must be at the same position
in the 2 strings.
* If position is set to a value of x , the match must be within x positions in
the 2 strings.
For example, if position = 2, then the match must be within 2 positions in the
2 strings.
The function returns zero in the following cases:
* If the length argument is greater than the length of either values in column
or expression.
* If the length argument is less than or equal to 0 or if either values in column or
expression is an empty string.
Patterns beyond the length of 255 are ignored.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal.
If the argument is null, then result is null.
Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>'.
Supported column types: CHAR, VARCHAR, or CLOB
Types: ColumnExpression, str
length:
Required Argument.
Specifies an integer value ninn-gram, which is the comparison length.
Types: int
position:
Optional Argument.
Specifies an integer value that the n-gram is a positional n-gram match.
Default Value: 1
Types: int
RAISES:
TypeError, ValueError, TeradataMlException
RETURNS:
DataFrameColumn
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
# Create a DataFrame on 'admissions_train' table.
>>> df = DataFrame("admissions_train").iloc[:4]
>>> print(df)
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0
# Example 1: Returns n-gram value for character string in "stats" and "programming" column and
# pass it as input to DataFrame.assign().
>>> res = df.assign(col = df.stats.ngram(df.programming, 3))
>>> print(res)
masters gpa stats programming admitted col
id
3 no 3.70 Novice Beginner 1 0
4 yes 3.50 Beginner Novice 1 0
2 yes 3.76 Beginner Beginner 0 6
1 yes 3.95 Beginner Beginner 0 6
# Example 2: Executed ngram() function on "stats" column and filtered computed
# values which are equal to 6.
>>> print(df[df.stats.ngram(df.programming, 3) == 6])
masters gpa stats programming admitted
id
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0