Teradata Package for Python Function Reference | 20.00 - regexp_similar - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.regexp_similar = regexp_similar(regexp_string, match)
- DESCRIPTION:
Function compares string value in column to "regexp_string" and returns integer value.
PARAMETERS:
regexp_string:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal
which is to be used as regex.
Note:
1. If regexp_string is NULL, NULL is returned.
Format for the argument: '<dataframe>.<dataframe_column>'.
Supported column types: CHAR, VARCHAR
Types: ColumnExpression, str
match:
Optional Argument.
Specifies a character which decides the handling of regex matching.
Notes:
1. If a character in the argument is not valid, then that character is ignored.
2. If match_arg is not specified, is NULL, or is empty:
a. The match is case-sensitive.
b. A period does not match the newline character.
c. string value in column is treated as a single line.
3. The argument can contain more than one character.
Permitted values:
* 'i' - case-insensitive matching.
* 'c' - case sensitive matching.
* 'n' - the period character (match any character) can match the newline character.
* 'm' - string value in column is treated as multiple lines instead of as a single line.
With this option, the '^' and '$' characters apply to each line in string value in column
instead of the entire string value in column.
* 'l' - if string value in column exceeds the current maximum allowed size
(currently 16 MB), a NULL is returned instead of an error. This is useful for
long-running queries where you do not want long strings causing an error that
would make the query fail.
* 'x' - ignore whitespace.
Types: str
RAISES:
TypeError, ValueError, TeradataMlException
RETURNS:
Function returns following integer values in a DataFrameColumn:
* 1 (true) if the entire string value in column matches regexp_string.
* 0 (false) if the entire string value in column does not match regexp_string.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
# Create a DataFrame on 'admissions_train' table.
>>> df = DataFrame("admissions_train")
>>> print(df)
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
# Example1: Compares strings in "stats" column and "programming" column and
# pass it as input to DataFrame.assign().
>>> res_df = df.assign(col=df.stats.regexp_similar(df.programming, 'c'))
>>> print(res_df)
masters gpa stats programming admitted col
id
5 no 3.44 Novice Novice 0 1
34 yes 3.85 Advanced Beginner 0 0
13 no 4.00 Advanced Novice 1 0
40 yes 3.95 Novice Beginner 0 0
22 yes 3.46 Novice Beginner 0 0
19 yes 1.98 Advanced Advanced 0 1
36 no 3.00 Advanced Novice 0 0
15 yes 4.00 Advanced Advanced 1 1
7 yes 2.33 Novice Novice 1 1
17 no 3.83 Advanced Advanced 1 1