Teradata Package for Python Function Reference | 17.10 - percentile - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.percentile = percentile(percentile, interpolation='LINEAR')
- DESCRIPTION:
Function returns the value which represents the desired percentile from each group.
The result value is determined by the desired index (di) in an ordered list of values.
The following equation is for the di:
di = (number of values in group - 1) * percentile/100
When di is a whole number, that value is the returned result.
The di can also be between two data points, i and j, where i<j. In that case, the result
is interpolated according to the value specified in interpolation argument.
Notes:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
3. This function works with only DataFrame.groupby().
PARAMETERS:
percentile:
Required Argument.
Specifies the desired percentile value to calculate.
It should be between 0 and 1, both inclusive.
Types: int or float
interpolation:
Optional Argument.
Specifies the interpolation type to use to interpolate the result value when the
desired result lies between two data points.
The desired result lies between two data points, i and j, where i<j. In this case,
the result is interpolated according to the permitted values.
Permitted Values: "LINEAR", None
* LINEAR: Linear interpolation.
The result value is computed using the following equation:
result = i + (j - i) * (di/100)MOD 1
Specify by passing "LINEAR" as string to this parameter.
* None: No interpolation.
The result value is computed simply by returning a value from the set
of values.
Specify by passing None to this parameter.
Default Values: "LINEAR"
Types: str, NoneType
RETURNS:
teradataml DataFrame.
RAISES:
TypeError - If incorrect type of values passed to input argument.
ValueError - If invalid value passed to the the argument.
TeradataMLException - TDMLDF_AGGREGATE_FAILED - If percentile() operation fails to
generate the column-wise percentile values in the columns.
EXAMPLES:
>>> # Load the example datasets.
... load_example_data("dataframe", "admissions_train")
>>>
# Example 1: Executing percentile() function on DataFrame created on a regular table.
# Calculate the 25th percentile value for all numeric columns using default
# values, i.e., consider all rows (duplicate rows as well) and linear
# interpolation while computing the percentile value.
#
>>> # Create the required DataFrame.
... admissions_train = DataFrame("admissions_train")
>>> # Check DataFrame columns and let's peek at the data.
... admissions_train.columns
['id', 'masters', 'gpa', 'stats', 'programming', 'admitted']
>>> admissions_train.head()
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
5 no 3.44 Novice Novice 0
6 yes 3.50 Beginner Advanced 1
7 yes 2.33 Novice Novice 1
9 no 3.82 Advanced Advanced 1
10 no 3.71 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0
>>> df = admissions_train.groupby("admitted").percentile(0.25)
>>> df.show_query()
'select "admitted", percentile_cont(0.25) WITHIN GROUP (ORDER BY id) AS percentile_id, percentile_cont(0.25) WITHIN GROUP (ORDER BY gpa) AS percentile_gpa from "admissions_train" group by "admitted"'
>>> df
admitted percentile_id percentile_gpa
0 0 15 3.4525
1 1 10 3.5050
>>>
#
# Example 2: Executing percentile() function on DataFrame created on a regular table.
# Calculate the 35th percentile value for all numeric columns using default
# values, i.e., consider all rows (duplicate rows as well) and no
# interpolation while computing the percentile value.
#
>>> # Create the required DataFrame.
... admissions_train = DataFrame("admissions_train")
>>> # Check DataFrame columns and let's peek at the data.
... admissions_train.columns
['id', 'masters', 'gpa', 'stats', 'programming', 'admitted']
>>> admissions_train.head()
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
5 no 3.44 Novice Novice 0
6 yes 3.50 Beginner Advanced 1
7 yes 2.33 Novice Novice 1
9 no 3.82 Advanced Advanced 1
10 no 3.71 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0
>>> df = admissions_train.groupby("admitted").percentile(0.25, interpolation=None)
'select "admitted", percentile_disc(0.25) WITHIN GROUP (ORDER BY id) AS percentile_id, percentile_disc(0.25) WITHIN GROUP (ORDER BY gpa) AS percentile_gpa from "admissions_train" group by "admitted"'
>>> df.show_query()
>>> df
admitted percentile_id percentile_gpa
0 0 19 3.46
1 1 13 3.57
>>>