Teradata Package for Python Function Reference on VantageCloud Lake - regexp_instr - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.regexp_instr = regexp_instr(regexp_string, position, occurence, return_opt, match)
- DESCRIPTION:
Function searches for the string value in column for a match to "regexp_string".
PARAMETERS:
regexp_string:
Required Argument.
Specifies a ColumnExpression of a string column or a string literal which
is to be used as regex.
Note:
1. If regexp_string is NULL, NULL is returned.
Format for the argument: '<dataframe>.<dataframe_column>'.
Supported column types: CHAR, VARCHAR
Types: ColumnExpression, str
position:
Optional Argument.
Specifies the position in string value in column from which to start searching.
Notes:
1. If the value is greater than the input string length, 0 is returned.
2. If the value is NULL, the value NULL is returned.
Types: ColumnExpression, int
Default Value: 1
occurence:
Optional Argument.
Specifies the number of the occurrence to be returned.
Notes:
1. If the value is greater than the number of matches found, 0 is returned.
2. If the value is NULL, a NULL result is returned.
Types: ColumnExpression, int
Default Value: 1
return_opt:
Optional Argument.
Specifies a numeric value 0 or 1 that decides return value for the match.
Note:
1. If the value is NULL, NULL is returned.
Permitted Values:
* 0 = function returns the beginning position of the match (default).
* 1 = function returns the end position (character following the occurrence) of the match.
Types: ColumnExpression, int
Default Value: 0
match:
Optional Argument.
Specifies a character which decides the handling of regex matching.
Notes:
1. If a character in the argument is not valid, then that error is reported.
2. If match_arg is not specified, is NULL, or is empty:
a. The match is case-sensitive.
b. A period does not match the newline character.
c. string value in column is treated as a single line.
3. The argument can contain more than one character.
Permitted Values:
* 'i' - case-insensitive matching.
* 'c' - case sensitive matching.
* 'n' - the period character (match any character) can match the newline character.
* 'm' - string value in column is treated as multiple lines instead of as a single line.
With this option, the '^' and '$' characters apply to each line in string value in column
instead of the entire string value in column.
* 'l' - if string value in column exceeds the current maximum allowed string value in column size
(currently 16 MB), a NULL is returned instead of an error. This is useful for
long-running queries where you do not want long strings causing an error that
would make the query fail.
* 'x' - ignore whitespace.
Types: str
RAISES:
TypeError, ValueError, TeradataMlException
RETURNS:
DataFrameColumn
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
# Create a DataFrame on 'admissions_train' table.
>>> df = DataFrame("admissions_train")
>>> print(df)
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
# Example1: Searches for "d" substring in "stats" column and returns the
# position of the matched string based on integer value in
# "admitted" column and pass it as input to DataFrame.assign().
>>> res_df = df.assign(col = df.stats.regexp_instr("d", 1, 1, df.admitted, 'c'))
>>> print(res_df)
masters gpa stats programming admitted col
id
13 no 4.00 Advanced Novice 1 3
26 yes 3.57 Advanced Advanced 1 3
5 no 3.44 Novice Novice 0 0
19 yes 1.98 Advanced Advanced 0 2
15 yes 4.00 Advanced Advanced 1 3
40 yes 3.95 Novice Beginner 0 0
7 yes 2.33 Novice Novice 1 0
22 yes 3.46 Novice Beginner 0 0
36 no 3.00 Advanced Novice 0 2
38 yes 2.65 Advanced Beginner 1 3