Teradata Package for Python Function Reference on VantageCloud Lake - filter - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.filter = filter(self, items=None, like=None, regex=None, axis=1, **kw)
- DESCRIPTION:
Filter rows or columns of dataframe according to labels in the specified index.
The filter is applied to the columns of the index when axis is set to 'rows'.
Must use one of the parameters 'items', 'like', and 'regex' only.
PARAMETERS:
axis:
Optional Argument.
Specifies the axis to filter on.
1 denotes column axis (default). Alternatively, 'columns' can be specified.
0 denotes row axis. Alternatively, 'rows' can be specified.
Default Values: 1
Permitted Values: 0, 1, 'rows', 'columns'
Types: int OR str
items:
Optional Argument.
List of values that the info axis should be restricted to.
When axis is 1, items are a list of column names.
When axis is 0, items are a list of literal values.
Types: list of Strings (str) or literals
like:
Optional Argument.
Specifies the substring pattern.
When axis is 1, substring pattern for matching column names.
When axis is 0, substring pattern for checking index values
with REGEXP_SUBSTR.
Types: str
regex:
Optional Argument.
Specifies a regular expression pattern.
When axis is 1, regex pattern for re.search(regex, column_name).
When axis is 0, regex pattern for checking index values
with REGEXP_SUBSTR.
Types: str
**kw: optional keyword arguments
varchar_size:
Specifies an integer to specify the size of varchar-casted index.
Used when axis=0 or axis='rows' and index must be char-like in
"like" and "regex" filtering.
Default Value: configure.default_varchar_size
Types: int
match_arg: string
Specifies argument to pass if axis is 0/'rows' and regex is used.
Valid values for match_arg are:
- 'i': case-insensitive matching.
- 'c': case sensitive matching.
- 'n': the period character (match any character) can match
the newline character.
- 'm': index value is treated as multiple lines instead of as a single
line. With this option, the '^' and '$' characters apply to each
line in source_string instead of the entire index value.
- 'l': if index value exceeds the current maximum allowed size
(currently 16 MB), a NULL is returned instead of an error.
This is useful for long-running queries where you do not want
long strings causing an error that would make the query fail.
- 'x': ignore whitespace.
The 'match_arg' argument may contain more than one character.
If a character in 'match_arg' is not valid, then that character is ignored.
See Teradata® Database SQL Functions, Operators, Expressions, and Predicates,
for more information on specifying arguments for REGEXP_SUBSTR.
NOTES:
- Using 'regex' or 'like' with axis equal to 0 will attempt to cast the
values in the index to a VARCHAR.
Note that conversion between BYTE data and other types is not supported.
Also, LOBs are not allowed to be compared.
- When using 'like' or 'regex', datatypes are casted into VARCHAR.
This may alter the format of the value in the column(s)
and thus whether there is a match or not. The size of the VARCHAR may also
play a role since the casted value is truncated if the size is not big enough.
See varchar_size under **kw: optional keyword arguments.
RETURNS:
teradataml DataFrame
RAISES:
ValueError if more than one parameter: 'items', 'like', or 'regex' is used.
TeradataMlException if invalid argument values are given.
EXAMPLES:
>>> load_example_data("dataframe","admissions_train")
>>> df = DataFrame('admissions_train')
>>> df
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
37 no 3.52 Novice Novice 1
35 no 3.68 Novice Beginner 1
12 no 3.65 Novice Novice 1
4 yes 3.50 Beginner Novice 1
38 yes 2.65 Advanced Beginner 1
27 yes 3.96 Advanced Advanced 0
39 yes 3.75 Advanced Beginner 0
7 yes 2.33 Novice Novice 1
40 yes 3.95 Novice Beginner 0
>>>
>>> # retrieve columns masters, gpa, and stats in df
... df.filter(items = ['masters', 'gpa', 'stats'])
masters gpa stats
0 yes 4.00 Advanced
1 yes 3.45 Advanced
2 yes 3.50 Advanced
3 yes 4.00 Novice
4 yes 3.59 Advanced
5 no 3.87 Novice
6 yes 3.50 Beginner
7 yes 3.79 Advanced
8 no 3.00 Advanced
9 yes 1.98 Advanced
>>>
>>> # retrieve rows where index matches ‘2’, ‘4’
... df.filter(items = ['2', '4'], axis = 0)
masters gpa stats programming admitted
id
2 yes 3.76 Beginner Beginner 0
4 yes 3.50 Beginner Novice 1
>>>
>>> df = DataFrame('admissions_train', index_label="programming")
>>> df
id masters gpa stats admitted
programming
Beginner 22 yes 3.46 Novice 0
Novice 37 no 3.52 Novice 1
Beginner 35 no 3.68 Novice 1
Novice 12 no 3.65 Novice 1
Novice 4 yes 3.50 Beginner 1
Beginner 38 yes 2.65 Advanced 1
Advanced 27 yes 3.96 Advanced 0
Beginner 39 yes 3.75 Advanced 0
Novice 7 yes 2.33 Novice 1
Beginner 40 yes 3.95 Novice 0
>>>
>>> # retrieve columns with a matching substring
... df.filter(like = 'masters')
masters
0 yes
1 yes
2 yes
3 yes
4 yes
5 no
6 yes
7 yes
8 no
9 yes
>>>
>>> # retrieve rows where index values have ‘vice’ as a subtring
... df.filter(like = 'vice', axis = 'rows')
id masters gpa stats admitted
programming
Novice 12 no 3.65 Novice 1
Novice 5 no 3.44 Novice 0
Novice 24 no 1.87 Advanced 1
Novice 36 no 3.00 Advanced 0
Novice 23 yes 3.59 Advanced 1
Novice 13 no 4.00 Advanced 1
Novice 33 no 3.55 Novice 1
Novice 30 yes 3.79 Advanced 0
Novice 4 yes 3.50 Beginner 1
Novice 37 no 3.52 Novice 1
>>>
>>> # give a regular expression to match column names
... df.filter(regex = '^a.+')
admitted
0 0
1 1
2 1
3 1
4 1
5 1
6 0
7 0
8 1
9 0
>>>
>>> # give a regular expression to match values in index
... df.filter(regex = '^B.+', axis = 0)
id masters gpa stats admitted
programming
Beginner 39 yes 3.75 Advanced 0
Beginner 38 yes 2.65 Advanced 1
Beginner 3 no 3.70 Novice 1
Beginner 31 yes 3.50 Advanced 1
Beginner 21 no 3.87 Novice 1
Beginner 34 yes 3.85 Advanced 0
Beginner 32 yes 3.46 Advanced 0
Beginner 29 yes 4.00 Novice 0
Beginner 35 no 3.68 Novice 1
Beginner 22 yes 3.46 Novice 0
>>>
>>> # case-insensitive, ignore white space when matching index values
... df.filter(regex = '^A.+', axis = 0, match_args = 'ix')
id masters gpa stats admitted
programming
Advanced 20 yes 3.90 Advanced 1
Advanced 8 no 3.60 Beginner 1
Advanced 25 no 3.96 Advanced 1
Advanced 19 yes 1.98 Advanced 0
Advanced 14 yes 3.45 Advanced 0
Advanced 6 yes 3.50 Beginner 1
Advanced 17 no 3.83 Advanced 1
Advanced 11 no 3.13 Advanced 1
Advanced 15 yes 4.00 Advanced 1
Advanced 18 yes 3.81 Advanced 1
>>>
>>> # case-insensitive/ ignore white space/ match up to 32 characters
... df.filter(regex = '^A.+', axis = 0, match_args = 'ix', varchar_size = 32)
id masters gpa stats admitted
programming
Advanced 20 yes 3.90 Advanced 1
Advanced 8 no 3.60 Beginner 1
Advanced 25 no 3.96 Advanced 1
Advanced 19 yes 1.98 Advanced 0
Advanced 14 yes 3.45 Advanced 0
Advanced 6 yes 3.50 Beginner 1
Advanced 17 no 3.83 Advanced 1
Advanced 11 no 3.13 Advanced 1
Advanced 15 yes 4.00 Advanced 1
Advanced 18 yes 3.81 Advanced 1
>>>