Teradata Package for Python Function Reference | 17.10 - percentile - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.percentile = percentile(self, percentile, distinct=False, interpolation='LINEAR', as_time_series_aggregate=False, **kwargs)
- DESCRIPTION:
Function to get the percentile values for a column.
PARAMETERS:
percentile:
Required Argument.
Specifies the desired percentile value to calculate.
It should be between 0 and 1, both inclusive.
Types: float
distinct:
Optional Argument.
Specifies a flag that decides whether to consider duplicate values in
a column or not.
Note: "distinct" is insignificant if percentile is calculated
as regular aggregate i.e., "as_time_series_aggregate" is
set to False.
Default Values: False
Types: bool
interpolation:
Optional Argument.
Specifies the interpolation type to use to interpolate the result value when the
desired result lies between two data points.
The desired result lies between two data points, i and j, where i<j. In this case,
the result is interpolated according to the permitted values.
Permitted Values for time series aggregate:
* LINEAR: Linear interpolation.
The result value is computed using the following equation:
result = i + (j - i) * (di/100)MOD 1
Specify by passing "LINEAR" as string to this parameter.
* LOW: Low value interpolation.
The result value is equal to i.
Specify by passing "LOW" as string to this parameter.
* HIGH: High value interpolation.
The result value is equal to j.
Specify by passing "HIGH" as string to this parameter.
* NEAREST: Nearest value interpolation.
The result value is i if (di/100 )MOD 1 <= .5; otherwise, it is j.
Specify by passing "NEAREST" as string to this parameter.
* MIDPOINT: Midpoint interpolation.
The result value is equal to (i+j)/2.
Specify by passing "MIDPOINT" as string to this parameter.
Permitted Values for regular aggregate:
* LINEAR: Linear interpolation.
Percentile is calculated after doing linear interpolation.
* None:
Percentile is calculated with no interpolation.
Default Values: "LINEAR"
Types: str
as_time_series_aggregate:
Optional Argument.
Specifies a flag that decides whether percentiles are being calculated
as regular aggregate or time series aggregate. When it is set to False, it'll
be executed as regular aggregate, if set to True; then it is used as time series
aggregate.
Default Values: False
Types: bool
kwargs:
Specifies optional keyword arguments.
RETURNS:
ColumnExpression
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", ["admissions_train", "ocean_buoys"])
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Create a DataFrame on 'ocean_buoys' table.
>>> ocean_buoys = DataFrame("ocean_buoys")
>>> ocean_buoys
TD_TIMECODE salinity temperature
buoyid
1 2014-01-06 09:02:25.122200 55 78.0
44 2014-01-06 10:00:24.333300 55 43.0
44 2014-01-06 10:00:25.122200 55 43.0
2 2014-01-06 21:01:25.122200 55 80.0
2 2014-01-06 21:03:25.122200 55 82.0
0 2014-01-06 08:00:00.000000 55 10.0
0 2014-01-06 08:08:59.999999 55 NaN
0 2014-01-06 08:09:59.999999 55 99.0
2 2014-01-06 21:02:25.122200 55 81.0
44 2014-01-06 10:00:24.000000 55 43.0
>>>
# Example 1: Calculate the 25th percentile of temperature in ocean_buoys table,
# with LINEAR interpolation.
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="10m", value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.assign(True, temperature_percentile_=ocean_buoys_grpby1.temperature.percentile(0.25))
temperature_percentile_
0 43
>>>
# Example 2: Calculate the 35th percentile of gpa in admissions_train table,
# with LINEAR interpolation.
>>> admissions_train_grpby = admissions_train.groupby("admitted")
>>> admissions_train_grpby.assign(True, percentile_cont_=admissions_train_grpby.gpa.percentile(0.35))
admitted percentile_cont_
0 0 3.460
1 1 3.565
>>>
# Example 3: Calculate the 45th percentile of gpa in admissions_train table,
# with no interpolation.
>>> admissions_train_grpby = admissions_train.groupby("admitted")
>>> admissions_train_grpby.assign(True, percentile_disc_=admissions_train_grpby.gpa.percentile(0.35, interpolation=None))
admitted percentile_disc_
0 0 3.46
1 1 3.57
>>>