The aggregate function ts.percentile() returns the value which represents the desired percentile from each group.
The result value is determined by the desired index (di) in an ordered list of values. The following equation is for the di:
di = (number of values in group - 1) * percentile/100
- When di is a whole number, that value is the returned result.
- When di is a number between two data points, i and j, where i<j, the result is interpolated according to the value specified in the interpolation.type argument.
- This function is valid only on columns with numeric types.
- Nulls are not included in the result computation.
Arguments:
- value.expression: Specifies the column for which percentile value is to be computed.
- percentile: Specifies the float value that represents the desired percentile.
- interpolation.type: Specifies the type of interpolation to interpolate the result value when the desired result lies between two data points.
Permitted values are LINEAR, LOW, HIGH, NEAREST and MIDPOINT.
Example 1: Calculate the 25th percentile of the 'temperature' column of sequenced PTI table
In this example, the desired result value lies between two data points, linear interpolation is used as it is the default interpolation scheme.
- Calculate the percentile.
> df_seq_percentile <- df_seq_grp %>% summarise(percent_temp = ts.percentile(temperature, 25))
- Print the results.
> df_seq_percentile %>% arrange(TIMECODE_RANGE, buoyid, percent_temp) # Source: lazy query [?? x 4] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Ordered by: TIMECODE_RANGE, buoyid, percent_temp TIMECODE_RANGE `GROUP BY TIME(MINUTES~ buoyid percent_temp <chr> <int64> <int> <dbl> 1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08~ 35345 0 10 2 2014-01-06 09:00:00.000000+00:00,2014-01-06 09~ 35347 1 71.2 3 2014-01-06 10:00:00.000000+00:00,2014-01-06 10~ 35349 44 43 4 2014-01-06 10:30:00.000000+00:00,2014-01-06 11~ 35350 22 23 5 2014-01-06 10:30:00.000000+00:00,2014-01-06 11~ 35350 44 43 6 2014-01-06 21:00:00.000000+00:00,2014-01-06 21~ 35371 2 80.5
Example 2: Calculate the 75th percentile of the distinct values in the 'temperature' column of non-PTI table
In this example, the desired result value lies between two data points, low value interpolation is used.
- Calculate the percentile.
> df_nonpti_percentile <- df_nonpti_grp %>% summarise(percent_temp = ts.percentile(distinct(temperature), 75, "LOW"))
- Print the results.
> df_nonpti_percentile %>% arrange(TIMECODE_RANGE, percent_temp) # Source: lazy query [?? x 3] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Ordered by: TIMECODE_RANGE, percent_temp TIMECODE_RANGE `GROUP BY TIME(MINUTES(~ percent_temp <chr> <int64> <dbl> 1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08:01:0~ 23149921 10 2 2014-01-06 08:01:00.000000+00:00,2014-01-06 08:02:0~ 23149922 10 3 2014-01-06 08:02:00.000000+00:00,2014-01-06 08:03:0~ 23149923 10 4 2014-01-06 08:03:00.000000+00:00,2014-01-06 08:04:0~ 23149924 10 5 2014-01-06 08:04:00.000000+00:00,2014-01-06 08:05:0~ 23149925 10 6 2014-01-06 08:05:00.000000+00:00,2014-01-06 08:06:0~ 23149926 10 7 2014-01-06 08:06:00.000000+00:00,2014-01-06 08:07:0~ 23149927 10 8 2014-01-06 08:07:00.000000+00:00,2014-01-06 08:08:0~ 23149928 10 9 2014-01-06 08:08:00.000000+00:00,2014-01-06 08:09:0~ 23149929 NA 10 2014-01-06 08:09:00.000000+00:00,2014-01-06 08:10:0~ 23149930 99 # ... with more rows