Use the index operator ([ ]) to filter rows based on logical expressions given by Python literals and DataFrame columns.
Examples Prerequisite
>>> df = DataFrame('iris_test') >>> df sepal_length sepal_width petal_length petal_width species id 120 6.0 2.2 5.0 1.5 3 30 4.7 3.2 1.6 0.2 1 70 5.6 2.5 3.9 1.1 2 5 5.0 3.6 1.4 0.2 1 60 5.2 2.7 3.9 1.4 2 10 4.9 3.1 1.5 0.1 1 65 5.6 2.9 3.6 1.3 2 20 5.1 3.8 1.5 0.3 1 15 5.8 4.0 1.2 0.2 1 80 5.7 2.6 3.5 1.0 2
>>> s_len = df.sepal_length >>> p_len = df.petal_length
Example: Inequality operations
The operators supported are <, <=, >, >=.
>>> df[s_len > p_len] sepal_length sepal_width petal_length petal_width species id 120 6.0 2.2 5.0 1.5 3 30 4.7 3.2 1.6 0.2 1 70 5.6 2.5 3.9 1.1 2 5 5.0 3.6 1.4 0.2 1 60 5.2 2.7 3.9 1.4 2 10 4.9 3.1 1.5 0.1 1 65 5.6 2.9 3.6 1.3 2 20 5.1 3.8 1.5 0.3 1 15 5.8 4.0 1.2 0.2 1 80 5.7 2.6 3.5 1.0 2
>>> df[s_len < p_len + 1] sepal_length sepal_width petal_length petal_width species id 105 6.5 3.0 5.8 2.2 3 85 5.4 3.0 4.5 1.5 2 115 5.8 2.8 5.1 2.4 3 150 5.9 3.0 5.1 1.8 3 135 6.1 2.6 5.6 1.4 3
Example: Equality operations
The operators supported are == and !=.
>>> df[s_len == p_len + 1] sepal_length sepal_width petal_length petal_width species id 125 6.7 3.3 5.7 2.1 3 145 6.7 3.3 5.7 2.5 3 120 6.0 2.2 5.0 1.5 3
>>> df[s_len != p_len + 2] sepal_length sepal_width petal_length petal_width species id 120 6.0 2.2 5.0 1.5 3 20 5.1 3.8 1.5 0.3 1 60 5.2 2.7 3.9 1.4 2 10 4.9 3.1 1.5 0.1 1 30 4.7 3.2 1.6 0.2 1 70 5.6 2.5 3.9 1.1 2 15 5.8 4.0 1.2 0.2 1 5 5.0 3.6 1.4 0.2 1 80 5.7 2.6 3.5 1.0 2 40 5.1 3.4 1.5 0.2 1
Example: Boolean operations
The operators supported are &, |, ~.
For example, if condition used is as length == 4.0 and width == 3.5, the first condition in previous case (length == 4.0), in 'and' condition is completely ignored and data satisfying only second condition (width == 3.5) will be returned.
For example, if condition used is as length == 4.0 or width == 3.5, the second condition in previous case (width == 3.5), in 'or' condition is completely ignored and data satisfying only first condition (length == 4.0) will be returned.
>>> df[~(s_len != p_len + 1)] sepal_length sepal_width petal_length petal_width species id 125 6.7 3.3 5.7 2.1 3 145 6.7 3.3 5.7 2.5 3 120 6.0 2.2 5.0 1.5 3
>>> df[(s_len > 5.5) & (p_len < 5)] sepal_length sepal_width petal_length petal_width species id 95 5.6 2.7 4.2 1.3 2 70 5.6 2.5 3.9 1.1 2 100 5.7 2.8 4.1 1.3 2 75 6.4 2.9 4.3 1.3 2 65 5.6 2.9 3.6 1.3 2 15 5.8 4.0 1.2 0.2 1 55 6.5 2.8 4.6 1.5 2 80 5.7 2.6 3.5 1.0 2
>>> df[(s_len >= 6.3) | (p_len == 4.2)] sepal_length sepal_width petal_length petal_width species id 110 7.2 3.6 6.1 2.5 3 75 6.4 2.9 4.3 1.3 2 130 7.2 3.0 5.8 1.6 3 145 6.7 3.3 5.7 2.5 3 140 6.9 3.1 5.4 2.1 3 125 6.7 3.3 5.7 2.1 3 105 6.5 3.0 5.8 2.2 3 95 5.6 2.7 4.2 1.3 2 55 6.5 2.8 4.6 1.5 2
It is recommended to surround expressions in parentheses in order to avoid ambiguity in operator precedence, similar to pandas.
df[2 >= s_len | 4 <= p_len] --------------------------------------------------------------------------- ArgumentError Traceback (most recent call last) ... ArgumentError: SQL expression object or string expected, got object of type <class 'int'> instead ... TeradataMlException: [Teradata][teradataml](TDML_2021) Unable to retrieve information for the DataFrame.
Supported Types and Operators
Python int, float, Decimal, str, and None literals can be used in filtering expressions.
All arithmetic expressions except floor division (//) and power (**) can be used.
All logical operators are supported.