ts.percentile() | Teradata R Package - 17.00 - ts.percentile() - Teradata R Package

Teradata® R Package User Guide

prodname
Teradata R Package
vrm_release
17.00
created_date
November 2020
category
User Guide
featnum
B700-4005-090K

The aggregate function ts.percentile() returns the value which represents the desired percentile from each group.

The result value is determined by the desired index (di) in an ordered list of values. The following equation is for the di:

di = (number of values in group - 1) * percentile/100
  • When di is a whole number, that value is the returned result.
  • When di is a number between two data points, i and j, where i<j, the result is interpolated according to the value specified in the interpolation.type argument.
  • This function is valid only on columns with numeric types.
  • Nulls are not included in the result computation.
Arguments:
  • value.expression: Specifies the column for which percentile value is to be computed.
  • percentile: Specifies the float value that represents the desired percentile.
  • interpolation.type: Specifies the type of interpolation to interpolate the result value when the desired result lies between two data points.

    Permitted values are LINEAR, LOW, HIGH, NEAREST and MIDPOINT.

Example 1: Calculate the 25th percentile of the 'temperature' column of sequenced PTI table

In this example, the desired result value lies between two data points, linear interpolation is used as it is the default interpolation scheme.

  • Calculate the percentile.
    > df_seq_percentile <- df_seq_grp %>% summarise(percent_temp = ts.percentile(temperature, 25))
  • Print the results.
    > df_seq_percentile %>% arrange(TIMECODE_RANGE, buoyid, percent_temp)
    # Source:     lazy query [?? x 4]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, buoyid, percent_temp
      TIMECODE_RANGE                                  `GROUP BY TIME(MINUTES~ buoyid percent_temp
      <chr>                                           <int64>                  <int>        <dbl>
    1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08~ 35345                        0         10 
    2 2014-01-06 09:00:00.000000+00:00,2014-01-06 09~ 35347                        1         71.2
    3 2014-01-06 10:00:00.000000+00:00,2014-01-06 10~ 35349                       44         43 
    4 2014-01-06 10:30:00.000000+00:00,2014-01-06 11~ 35350                       22         23 
    5 2014-01-06 10:30:00.000000+00:00,2014-01-06 11~ 35350                       44         43 
    6 2014-01-06 21:00:00.000000+00:00,2014-01-06 21~ 35371                        2         80.5

Example 2: Calculate the 75th percentile of the distinct values in the 'temperature' column of non-PTI table

In this example, the desired result value lies between two data points, low value interpolation is used.

  • Calculate the percentile.
    > df_nonpti_percentile <- df_nonpti_grp %>% summarise(percent_temp = ts.percentile(distinct(temperature), 75, "LOW"))
  • Print the results.
    > df_nonpti_percentile %>% arrange(TIMECODE_RANGE, percent_temp)
    # Source:     lazy query [?? x 3]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, percent_temp
       TIMECODE_RANGE                                       `GROUP BY TIME(MINUTES(~ percent_temp
       <chr>                                                <int64>                         <dbl>
     1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08:01:0~ 23149921                           10
     2 2014-01-06 08:01:00.000000+00:00,2014-01-06 08:02:0~ 23149922                           10
     3 2014-01-06 08:02:00.000000+00:00,2014-01-06 08:03:0~ 23149923                           10
     4 2014-01-06 08:03:00.000000+00:00,2014-01-06 08:04:0~ 23149924                           10
     5 2014-01-06 08:04:00.000000+00:00,2014-01-06 08:05:0~ 23149925                           10
     6 2014-01-06 08:05:00.000000+00:00,2014-01-06 08:06:0~ 23149926                           10
     7 2014-01-06 08:06:00.000000+00:00,2014-01-06 08:07:0~ 23149927                           10
     8 2014-01-06 08:07:00.000000+00:00,2014-01-06 08:08:0~ 23149928                           10
     9 2014-01-06 08:08:00.000000+00:00,2014-01-06 08:09:0~ 23149929                           NA
    10 2014-01-06 08:09:00.000000+00:00,2014-01-06 08:10:0~ 23149930                           99
    # ... with more rows