| |
- ngram(string_column_expression1, string_column_expression2, length, position)
- DESCRIPTION:
Function returns the number of n-gram matches between string_column_expression1
and string_column_expression2. A high number of matching n-gram patterns implies
a high similarity between the two strings.
For positional n-gram matching, the position as well as the pattern must match
when measuring similarity. The position value indicates how far away positionally
the match may be between the 2 strings as follows:
* If position is set to a value of zero, the match must be at the same position
in the 2 strings.
* If position is set to a value of x , the match must be within x positions in
the 2 strings.
For example, if position = 2, then the match must be within 2 positions in the
2 strings.
The function returns zero in the following cases:
* If the length argument is greater than the length of either string_column_expression1
or string_column_expression2.
* If the length argument is less than or equal to 0 or if either string_column_expression1 or
string_column_expression2 is an empty string.
Patterns beyond the length of 255 are ignored.
PARAMETERS:
string_or_column_expression1:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal.
If the argument is null, then result is null.
Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>.expression'.
Supported column types: CHAR, VARCHAR, or CLOB
string_or_column_expression2:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal.
If the argument is null, then result is null.
Format of a ColumnExpression of a string column: '<dataframe>.<dataframe_column>.expression'.
Supported column types: CHAR, VARCHAR, or CLOB
length:
Required Argument.
Specifies an integer value ninn-gram, which is the comparison length.
position:
Optional Argument.
Specifies an integer value that the n-gram is a positional n-gram match.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example returns n-gram value for character string in "stats" and "programming" column.
# Import func from sqlalchemy to execute ngram function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
>>> ngram_func_ = func.ngram(admissions_train.stats.expression, admissions_train.programming.expression, 3)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(ngram_col_=ngram_func_)
>>> print(df)
masters gpa stats programming admitted ngram_col_
id
13 no 4.00 Advanced Novice 1 0
26 yes 3.57 Advanced Advanced 1 6
5 no 3.44 Novice Novice 0 4
19 yes 1.98 Advanced Advanced 0 6
15 yes 4.00 Advanced Advanced 1 6
40 yes 3.95 Novice Beginner 0 0
7 yes 2.33 Novice Novice 1 4
22 yes 3.46 Novice Beginner 0 0
36 no 3.00 Advanced Novice 0 0
38 yes 2.65 Advanced Beginner 1 0
>>>
|