ts.describe() | Teradata Package for R - ts.describe() - Teradata Package for R

Teradata® Package for R User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-04-09
dita:mapPath
efv1707506846369.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
nqx1518630623256
lifecycle
latest
Product Category
Teradata Vantage

The aggregate function ts.describe() returns the column statistics of the values in each group. By default, the function calculates maximum, minimum, average, sample standard deviation, median, mode, 25th percentile, 50th percentile and 75th percentile values of the column.

Unlike other aggregate functions, ts.describe() function generates multiple columns. The column names provided in summarise() call are not used as the resultant column name.

  • Nulls are not included in the result computation.
  • ts.describe() is valid only for numeric data.
  • Unlike other statistical values, mode values are calculated without excluding duplicates even though distinct() is used.
  • ts.describe() must be used with group_by_time().
  • Mode computation can result in duplicate rows with different values.
  • Percentile computations use a linear interpolation process to determine if the percentile lies between two data points.

    Refer to ts.percentile() for additional interpolation options.

Arguments:
  • value.expression: Specify the column for which the statistics are to be computed.

Use ts.describe(distinct(column_name)) to exclude duplicate rows while calculating statistics.

Example 1: Calculate the statistics of the 'temperature' column of sequenced PTI table

  • Calculate the statistics.
    > df_seq_describe <- df_seq_grp %>% summarise(describe_temp = ts.describe(temperature))
  • Print the results.
    > df_seq_describe %>% arrange(TIMECODE_RANGE, buoyid, `MODE(temperature)`)
    # Source:     lazy query [?? x 12]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, buoyid, `MODE(temperature)`
       TIMECODE_RANGE `GROUP BY TIME(~ buoyid `Maximum(temper~ `Minimum(temper~ `Average(temper~
       <chr>          <int64>           <int>            <int>            <int>            <dbl>
     1 2014-01-06 08~ 35345                 0              100               10             54.8
     2 2014-01-06 09~ 35347                 1               79               70             74.5
     3 2014-01-06 09~ 35347                 1               79               70             74.5
     4 2014-01-06 09~ 35347                 1               79               70             74.5
     5 2014-01-06 09~ 35347                 1               79               70             74.5
     6 2014-01-06 09~ 35347                 1               79               70             74.5
     7 2014-01-06 09~ 35347                 1               79               70             74.5
     8 2014-01-06 10~ 35349                44               56               43             49 
     9 2014-01-06 10~ 35350                22               23               23             23 
    10 2014-01-06 10~ 35350                44               43               43             43 
    # ... with more rows, and 6 more variables: `STDDEV_SAMP(temperature)` <dbl>,
    #   `MEDIAN(temperature)` <dbl>, `MODE(temperature)` <int>, `PERCENTILE(temperature, 25,
    #   LINEAR)` <dbl>, `PERCENTILE(temperature, 50, LINEAR)` <dbl>, `PERCENTILE(temperature,
    #   75, LINEAR)` <dbl>

Example 2: Calculate the statistics of the 'temperature' column of non-PTI table

  • Calculate the statistics.
    > df_nonpti_describe <- df_nonpti %>% group_by_time(timebucket.duration = "10m", timecode.column = "TIMECODE") %>% summarise(describe_temp = ts.describe(distinct(temperature)))
  • Print the results.
    > df_nonpti_describe %>% arrange(TIMECODE_RANGE, `Mode(temperature)`)
    # Source:     lazy query [?? x 11]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, `Mode(temperature)`
       TIMECODE_RANGE `GROUP BY TIME(~ `Maximum(Distin~ `Minimum(Distin~ `Average(Distin~
       <chr>          <int64>                     <int>            <int>            <dbl>
     1 2014-01-06 08~ 2314993                        99               10             54.5
     2 2014-01-06 08~ 2314993                        99               10             54.5
     3 2014-01-06 08~ 2314994                       100               10             55 
     4 2014-01-06 08~ 2314994                       100               10             55 
     5 2014-01-06 09~ 2314999                        79               70             74.5
     6 2014-01-06 09~ 2314999                        79               70             74.5
     7 2014-01-06 09~ 2314999                        79               70             74.5
     8 2014-01-06 09~ 2314999                        79               70             74.5
     9 2014-01-06 09~ 2314999                        79               70             74.5
    10 2014-01-06 09~ 2314999                        79               70             74.5
    # ... with more rows, and 6 more variables: `STDDEV_SAMP(Distinct(temperature))` <dbl>,
    #   `MEDIAN(Distinct(temperature))` <dbl>, `MODE(temperature)` <int>,
    #   `PERCENTILE(Distinct(temperature, 25, LINEAR))` <dbl>, `PERCENTILE(Distinct(temperature,
    #   50, LINEAR))` <dbl>, `PERCENTILE(Distinct(temperature, 75, LINEAR))` <dbl>