| |
- regexp_similar(source_string, regexp_string, match_arg)
- DESCRIPTION:
Function compares source_string to regexp_string and returns integer value.
Function returns following integer value:
* 1 (true) if the entire source_string matches regexp_string.
* 0 (false) if the entire source_string does not match regexp_string.
PARAMETERS:
source_string:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal
from which substring is to be extracted.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
If source_string is NULL, NULL is returned.
regexp_string:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal
which is to be used as regex.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
If regexp_string is NULL, NULL is returned.
match_arg:
Optional Argument.
Specifies a character which decides the handling of regex matching.
Valid values are:
* 'i' = case-insensitive matching.
* 'c' = case sensitive matching.
* 'n' = the period character (match any character) can match the newline character.
* 'm' = source_string is treated as multiple lines instead of as a single line.
With this option, the '^' and '$' characters apply to each line in source_string
instead of the entire source_string.
* 'l' = if source_string exceeds the current maximum allowed source_string size
(currently 16 MB), a NULL is returned instead of an error. This is useful for
long-running queries where you do not want long strings causing an error that
would make the query fail.
* 'x' = ignore whitespace.
The argument can contain more than one character.
Notes:
1. If a character in the argument is not valid, then that character is ignored.
2. If match_arg is not specified, is NULL, or is empty:
a. The match is case-sensitive.
b. A period does not match the newline character.
c. source_string is treated as a single line.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example compares strings in "stats" column and "programming" column.
# Import func from sqlalchemy to execute regexp_similar() function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
# Note: Function name is case-insensitive.
>>> regexp_similar_ = func.REGEXP_SIMILAR(admissions_train.stats.expression,
... admissions_train.programming.expression, 'c')
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(regexp_similar_col=regexp_similar_)
>>> print(df)
masters gpa stats programming admitted regexp_similar_col
id
5 no 3.44 Novice Novice 0 1
34 yes 3.85 Advanced Beginner 0 0
13 no 4.00 Advanced Novice 1 0
40 yes 3.95 Novice Beginner 0 0
22 yes 3.46 Novice Beginner 0 0
19 yes 1.98 Advanced Advanced 0 1
36 no 3.00 Advanced Novice 0 0
15 yes 4.00 Advanced Advanced 1 1
7 yes 2.33 Novice Novice 1 1
17 no 3.83 Advanced Advanced 1 1
>>>
|