Teradata Package for Python Function Reference | 20.00 - edit_distance - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.edit_distance = edit_distance(expression, ci=1, cd=1, cs=1, ct=1)
- DESCRIPTION:
Function returns the minimum number of edit operations (insertions, deletions,
substitutions and transpositions) required to transform string1 (values in the column)
into "expression" (value passed in argument).
Edit distance measures the similarity between two strings. A low number of deletions,
insertions, substitutions or transpositions implies a high similarity. The insertions,
deletions, substitutions, and transpositions are based on the Damerau-Levenshtein
Distance algorithm with modifications for costed operations.
ALTERNATE_NAME:
levenshtein
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal.
Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>'.
Support column types are: CHARACTER, VARCHAR, or CLOB.
Types: ColumnExpression, str
ci:
Optional Argument.
Specifies the relative cost of an insert operation.
The value specified must be a non-negative integer.
Default Value: 1
Types: int
cd:
Optional Argument.
Specifies the relative cost of a delete operation.
The value specified must be a non-negative integer.
Default Value: 1
Types: int
cs:
Optional Argument.
Specifies the relative cost of a substitute operation.
The value specified must be a non-negative integer.
Default Value: 1
Types: int
ct:
Optional Argument.
Specifies the relative cost of a transpose operation.
The value specified must be a non-negative integer.
Default Value: 1
Types: int
RAISES:
TypeError, ValueError, TeradataMlException
RETURNS:
DataFrameColumn
EXAMPLES:
# Load the data to execute the example.
>>> load_example_data("dataframe", "admissions_train")
# Create a DataFrame on 'admissions_train' table.
>>> df = DataFrame("admissions_train").iloc[:4]
>>> print(df)
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0
# Example 1: Calculate the edit distance between values in "stats" and "programming"
# columns and pass it as input to DataFrame.assign().
>>> res = df.assign(col = df.stats.edit_distance(df.programming))
>>> print(res)
masters gpa stats programming admitted col
id
3 no 3.70 Novice Beginner 1 6
4 yes 3.50 Beginner Novice 1 6
2 yes 3.76 Beginner Beginner 0 0
1 yes 3.95 Beginner Beginner 0 0
# Example 2: Calculate the edit distance between values in "stats" and "programming"
# columns with cost associated with the edit operations passed and pass it
# as input to DataFrame.assign().
>>> res = df.assign(col = df.stats.edit_distance(df.programming, 2, 1, 1, 2))
>>> print(res)
masters gpa stats programming admitted editdistance_func col
id
3 no 3.70 Novice Beginner 1 8 8
4 yes 3.50 Beginner Novice 1 6 6
2 yes 3.76 Beginner Beginner 0 0 0
1 yes 3.95 Beginner Beginner 0 0 0
# Example 3: Executed edit_distance() function on "stats" column and filtered computed
# values which are equal to 8.
>>> print(df[df.stats.edit_distance(df.programming, 2, 1, 1, 2) == 8])
masters gpa stats programming admitted
id
3 no 3.7 Novice Beginner 1