Teradata Package for Python Function Reference | 17.10 - ngram - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

ngram

Functions
		ngram(string_column_expression1, string_column_expression2, length, position) DESCRIPTION: Function returns the number of n-gram matches between string_column_expression1 and string_column_expression2. A high number of matching n-gram patterns implies a high similarity between the two strings. For positional n-gram matching, the position as well as the pattern must match when measuring similarity. The position value indicates how far away positionally the match may be between the 2 strings as follows: * If position is set to a value of zero, the match must be at the same position in the 2 strings. * If position is set to a value of x , the match must be within x positions in the 2 strings. For example, if position = 2, then the match must be within 2 positions in the 2 strings. The function returns zero in the following cases: * If the length argument is greater than the length of either string_column_expression1 or string_column_expression2. * If the length argument is less than or equal to 0 or if either string_column_expression1 or string_column_expression2 is an empty string. Patterns beyond the length of 255 are ignored. PARAMETERS: string_or_column_expression1: Required Argument. Specifies a ColumnExpression of a string column or a string literal. If the argument is null, then result is null. Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>.expression'. Supported column types: CHAR, VARCHAR, or CLOB string_or_column_expression2: Required Argument. Specifies a ColumnExpression of a string column or a string literal. If the argument is null, then result is null. Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>.expression'. Supported column types: CHAR, VARCHAR, or CLOB length: Required Argument. Specifies an integer value ninn-gram, which is the comparison length. position: Optional Argument. Specifies an integer value that the n-gram is a positional n-gram match. NOTE: Function accepts positional arguments only. EXAMPLES: # Load the data to run the example. >>> load_example_data("dataframe", "admissions_train") >>> # Create a DataFrame on 'admissions_train' table. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Example returns n-gram value for character string in "stats" and "programming" column. # Import func from sqlalchemy to execute ngram function. >>> from sqlalchemy import func # Create a sqlalchemy Function object. >>> ngram_func_ = func.ngram(admissions_train.stats.expression, admissions_train.programming.expression, 3) >>> # Pass the Function object as input to DataFrame.assign(). >>> df = admissions_train.assign(ngram_col_=ngram_func_) >>> print(df) masters gpa stats programming admitted ngram_col_ id 13 no 4.00 Advanced Novice 1 0 26 yes 3.57 Advanced Advanced 1 6 5 no 3.44 Novice Novice 0 4 19 yes 1.98 Advanced Advanced 0 6 15 yes 4.00 Advanced Advanced 1 6 40 yes 3.95 Novice Beginner 0 0 7 yes 2.33 Novice Novice 1 4 22 yes 3.46 Novice Beginner 0 0 36 no 3.00 Advanced Novice 0 0 38 yes 2.65 Advanced Beginner 1 0 >>>