Teradata R Package Function Reference - Time Series Aggregates - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Introduction to Time Series Aggregates

The Time Series(TS) aggregate functions help perform aggregate operations on time series data. The goal of this vignette is to provide introductory examples to get the user familiar with the group_by_time API and other TS aggregate functions.

Each time series aggregate function has a section in this vignette. Each section can be read in isolation, except where references are made to other sections or other vignettes. Each section also has a subsection called “Things to Note” for good practices and specific behavior to consider when using the aggregate functions with Teradata Vantage.

Once the connection is established using td_create_context, you can load the example tables ocean_buoys_seq, ocean_buoys_nonseq and ocean_buoys_nonpti and create respective tibbles using the following statements:

loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq", "ocean_buoys_nonpti")
# Create tbl_teradata objects.
df_seq <- tbl(con, "ocean_buoys_seq")
df_nonseq <- tbl(con, "ocean_buoys_nonseq")
df_nonpti <- tbl(con, "ocean_buoys_nonpti")

group_by_time API

Every time series aggregation must be preceded with grouping by time using the API group_by_time. For more information on the API, one can use help(group_by_time) or ?group_by_time in R console. Here, we provide different usages of group_by_time.

# Grouping the sequenced PTI tables based on time with timebucket duration of 30 minutes and the column 'buoyid'. Note the use of shorthand notation for timebucket duration.
df_seq_grp <- df_seq %>% 
  group_by_time(timebucket.duration = "30m", value.expression = "buoyid")

# Grouping the non-PTI tables based on time with timebucket duration of 1 minute and filling the missing timebuckets with previous values. Note the use of formal notation for timebucket duration and timecode.column argument (mandatory for non-PTI table).
df_nonpti_grp <- df_nonpti %>% 
  group_by_time(timebucket.duration = "MINUTES(1)", timecode.column = "TIMECODE", fill = "PREV")

# Grouping the non-PTI tables based on time with timebucket duration of 1 Calendar Year with fill = 10000 (some numeric constant)
df_nonseq_grp <- df_nonseq %>% 
  group_by_time(timebucket.duration = "CAL_YEARS(1)", fill = 10000)

Aggregate Functions

Mode : ts.mode()

The agggregate function ts.mode returns the mode of all values in each group. In the event of a tie between two or more values, a row per result is returned. Mode is a single-threaded function. ts.mode uses one argument, named value.expression which holds the column for which mode is to be computed.

# Mode of the column 'temperature' of sequenced PTI table.
df_seq_mode <- df_seq_grp %>% summarise(mode_temp = ts.mode(temperature))

# Print the results.
df_seq_mode
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                      `GROUP BY TIME(MINU~ buoyid mode_temp
#>   <chr>                               <S3: integer64>       <int>     <int>
#> 1 2014-01-06 09:00:00.000000+00:00,2~ 35347                     1        70
#> 2 2014-01-06 08:00:00.000000+00:00,2~ 35345                     0        10
#> 3 2014-01-06 21:00:00.000000+00:00,2~ 35371                     2        80
#> 4 2014-01-06 21:00:00.000000+00:00,2~ 35371                     2        81
#> 5 2014-01-06 09:00:00.000000+00:00,2~ 35347                     1        71
#> 6 2014-01-06 10:00:00.000000+00:00,2~ 35349                    44        43
#> # ... with more rows

# Mode of the column 'temperature' of non-sequenced PTI table.
df_nonseq_mode <- df_nonseq_grp %>% summarise(mode_temp = ts.mode(temperature))

# Print the results.
df_nonseq_mode
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                           `GROUP BY TIME(CAL_YE~ mode_temp
#>   <chr>                                    <S3: integer64>            <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-0~ 3                             43

# Mode of the column 'temperature' of non-PTI table.
df_nonpti_mode <- df_nonpti_grp %>% summarise(mode_temp = ts.mode(temperature))

# Print the results.
df_nonpti_mode
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                            `GROUP BY TIME(MINUT~ mode_temp
#>   <chr>                                     <S3: integer64>           <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01~ 23149923                     10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01~ 23149925                     10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01~ 23149926                     10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01~ 23149927                     10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01~ 23149924                     10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01~ 23149922                     10
#> # ... with more rows

Notes

  • The data type of the column containing mode values is same as that of the column for which mode is to be computed.
  • NULLs/NAs are not included in the result computation.
  • Mode is valid only for numeric data.
  • Mode must be used with group_by_time().

Bottom : ts.bottom()

The agggregate function ts.bottom returns the smallest number.of.values in the value.expression for each group, with or without ties. Bottom is a single-threaded function.ts.bottom uses three arguments:

  • value.expression which specifies the column from which bottom values are to be returned.
  • number.of.values which specifies the integer value representing the number of values to return.
  • with.ties which specifies a flag to decide whether to run bottom function with ties or not. Default value is FALSE.
# Get the smallest 2 values of the column 'temperature' for each group without ties of sequenced PTI table.
df_seq_bottom <- df_seq_grp %>% summarise(bottom_temp = ts.bottom(temperature, 2))

# Print the results.
df_seq_bottom
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                    `GROUP BY TIME(MINU~ buoyid bottom_temp
#>   <chr>                             <S3: integer64>       <int>       <int>
#> 1 2014-01-06 09:00:00.000000+00:00~ 35347                     1          70
#> 2 2014-01-06 08:00:00.000000+00:00~ 35345                     0          10
#> 3 2014-01-06 08:00:00.000000+00:00~ 35345                     0          10
#> 4 2014-01-06 21:00:00.000000+00:00~ 35371                     2          80
#> 5 2014-01-06 21:00:00.000000+00:00~ 35371                     2          81
#> 6 2014-01-06 09:00:00.000000+00:00~ 35347                     1          71
#> # ... with more rows

# Get the smallest 4 values of the column 'temperature' for each group with ties of non-sequenced PTI table.
df_nonseq_bottom <- df_nonseq_grp %>% summarise(bottom_temp = ts.bottom(temperature, 4, TRUE))

# Print the results.
df_nonseq_bottom
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                         `GROUP BY TIME(CAL_YE~ bottom_temp
#>   <chr>                                  <S3: integer64>              <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015~ 3                               43
#> 2 2014-01-01 00:00:00.000000+00:00,2015~ 3                               43
#> 3 2014-01-01 00:00:00.000000+00:00,2015~ 3                               43
#> 4 2014-01-01 00:00:00.000000+00:00,2015~ 3                               43
#> 5 2014-01-01 00:00:00.000000+00:00,2015~ 3                               43
#> 6 2014-01-01 00:00:00.000000+00:00,2015~ 3                               10
#> # ... with more rows

# Get the smallest 2 values of the column 'temperature' for each group with ties of non-PTI table.
df_nonpti_bottom <- df_nonpti_grp %>% summarise(bottom_temp = ts.bottom(temperature, 2, TRUE))

# Print the results.
df_nonpti_bottom
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                          `GROUP BY TIME(MINUT~ bottom_temp
#>   <chr>                                   <S3: integer64>             <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-~ 23149923                       10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-~ 23149925                       10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-~ 23149926                       10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-~ 23149927                       10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-~ 23149924                       10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-~ 23149922                       10
#> # ... with more rows

Notes

  • The data type of the column containing bottom values is same as that of the column for which bottom is to be computed.
  • NULLs/NAs are not included in the result computation.
  • Bottom is valid only for numeric data.
  • Bottom must be used with group_by_time().
  • Bottom with ties implies that the rows returned include the specified number of rows in the ordered set for each timebucket. It includes any rows where the sort key value is the same as the sort key value in the last row that satisfies the specified number or percentage of rows. If this clause is omitted and ties are found, the earliest value in terms of timecode is returned.

Top : ts.top()

The agggregate function ts.top returns the largest number.of.values in the value.expression for each group, with or without ties. Top is a single-threaded function.ts.top uses three arguments:

  • value.expression which specifies the column from which top values are to be returned.
  • number.of.values which specifies the integer value representing the number of values to return.
  • with.ties which specifies a flag to decide whether to run top function with ties or not. Default value is FALSE.
# Get the largest 2 values of the column 'temperature' for each group without ties of sequenced PTI table.
df_seq_top <- df_seq_grp %>% summarise(top_temp = ts.top(temperature, 2))

# Print the results.
df_seq_top
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                       `GROUP BY TIME(MINU~ buoyid top_temp
#>   <chr>                                <S3: integer64>       <int>    <int>
#> 1 2014-01-06 09:00:00.000000+00:00,20~ 35347                     1       78
#> 2 2014-01-06 08:00:00.000000+00:00,20~ 35345                     0       99
#> 3 2014-01-06 08:00:00.000000+00:00,20~ 35345                     0      100
#> 4 2014-01-06 21:00:00.000000+00:00,20~ 35371                     2       81
#> 5 2014-01-06 21:00:00.000000+00:00,20~ 35371                     2       82
#> 6 2014-01-06 09:00:00.000000+00:00,20~ 35347                     1       79
#> # ... with more rows

# Get the largest 4 values of the column 'temperature' for each group with ties of non-sequenced PTI table.
df_nonseq_top <- df_nonseq_grp %>% summarise(top_temp = ts.top(temperature, 4, TRUE))

# Print the results.
df_nonseq_top
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                            `GROUP BY TIME(CAL_YE~ top_temp
#>   <chr>                                     <S3: integer64>           <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-01~ 3                            99
#> 2 2014-01-01 00:00:00.000000+00:00,2015-01~ 3                           100
#> 3 2014-01-01 00:00:00.000000+00:00,2015-01~ 3                            82
#> 4 2014-01-01 00:00:00.000000+00:00,2015-01~ 3                            81

# Get the largest 2 values of the column 'temperature' for each group with ties of non-PTI table.
df_nonpti_top <- df_nonpti_grp %>% summarise(top_temp = ts.top(temperature, 2, TRUE))

# Print the results.
df_nonpti_top
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                             `GROUP BY TIME(MINUT~ top_temp
#>   <chr>                                      <S3: integer64>          <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01-~ 23149923                    10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01-~ 23149925                    10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01-~ 23149926                    10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01-~ 23149927                    10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01-~ 23149924                    10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01-~ 23149922                    10
#> # ... with more rows

Notes

  • The data type of the column containing top values is same as that of the column for which top is to be computed.
  • NULLs/NAs are not included in the result computation.
  • Top is valid only for numeric data.
  • Top must be used with group_by_time().
  • Top with ties implies that the rows returned include the specified number of rows in the ordered set for each timebucket. It includes any rows where the sort key value is the same as the sort key value in the last row that satisfies the specified number or percentage of rows. If this clause is omitted and ties are found, the earliest value in terms of timecode is returned.

Median : ts.median()

The agggregate function ts.median returns the median of all values in each group. The function returns the average of the two middle values if the argument value.expression contains an even number of values. Median is a single-threaded function. ts.median uses two arguments:

  • value.expression specifies the column for which median is to be computed.
  • use.distinct specifies whether to exclude duplicates specified by value.expression from the computation. Default value is FALSE. By default, all non-null values in the column specified in value.expression, including duplicates, are considered in the computation. If use.distinct = TRUE, then all duplicate values are excluded in the computation.
# Median of the column 'temperature' of sequenced PTI table.
df_seq_mode <- df_seq_grp %>% summarise(median_temp = ts.median(temperature))

# Print the results.
df_seq_mode
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                    `GROUP BY TIME(MINU~ buoyid median_temp
#>   <chr>                             <S3: integer64>       <int>       <dbl>
#> 1 2014-01-06 10:30:00.000000+00:00~ 35350                    22        23  
#> 2 2014-01-06 10:30:00.000000+00:00~ 35350                    44        43  
#> 3 2014-01-06 09:00:00.000000+00:00~ 35347                     1        74.5
#> 4 2014-01-06 10:00:00.000000+00:00~ 35349                    44        53  
#> 5 2014-01-06 21:00:00.000000+00:00~ 35371                     2        81  
#> 6 2014-01-06 08:00:00.000000+00:00~ 35345                     0        54.5

# Median of the column 'temperature' of non-sequenced PTI table, excluding the duplicates in the computation.
df_nonseq_mode <- df_nonseq_grp %>% summarise(median_temp = ts.median(temperature, TRUE))

# Print the results.
df_nonseq_mode
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                         `GROUP BY TIME(CAL_YE~ median_temp
#>   <chr>                                  <S3: integer64>              <dbl>
#> 1 2014-01-01 00:00:00.000000+00:00,2015~ 3                               72

# Median of the column 'temperature' of non-PTI table.
df_nonpti_mode <- df_nonpti_grp %>% summarise(median_temp = ts.median(temperature))

# Print the results.
df_nonpti_mode
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                          `GROUP BY TIME(MINUT~ median_temp
#>   <chr>                                   <S3: integer64>             <dbl>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-~ 23149923                       10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-~ 23149925                       10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-~ 23149926                       10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-~ 23149927                       10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-~ 23149924                       10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-~ 23149922                       10
#> # ... with more rows

Notes

  • The data type of the column containing median values is REAL data type.
  • NULLs/NAs are not included in the result computation.
  • Median is valid only for numeric data.
  • The use of use.distinct = TRUE is valid only when the function ts.median is used with group_by_time. The function ts.median can be used with regular group_by only when the argument use.distinct takes FALSE. Otherwise, an error is thrown.
  • We recommend to use function median(value.expression) for regular group_by operation instead of the function ts.median as shown in the example 1 below.
  • The functions ts.median and median can also be used with only summarise and without group_by as shown in the example 2 below. We recommend to use the function ts.median only with time series aggregate operation.
  • The functionsts.median and median cannot be used with dplyr verbs like mutate, filter as shown in the example 3 below.
# Example 1

# Note the use of median() instead of ts.median() for regular group_by operation.
df_median <- df_seq %>% group_by(buoyid) %>% summarise(median_temp = median(temperature))

# Print the results.
df_median
#> # Source:   lazy query [?? x 2]
#> # Database: Teradata
#>   buoyid median_temp
#>    <int>       <int>
#> 1     44          43
#> 2      2          81
#> 3      1          74
#> 4     22          23
#> 5      0          54
# Example 2

# 2. Using median() without grouping.
df_median <- df_seq %>% summarise(median_temp = median(temperature))

# Print the results.
df_median
#> # Source:   lazy query [?? x 1]
#> # Database: Teradata
#>   median_temp
#>         <int>
#> 1          54
# Example 3

# Incorrect usage of median() with mutate and filter. The same is the case with ts.median().
df_median <- df_seq %>% mutate(median_t = median(temperature))
df_median <- df_seq %>% filter(temperature <= median(temperature))

First : ts.first()

The agggregate function ts.first returns the oldest value, determined by timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. First is a single-threaded function. ts.first uses one argument, named value.expression which holds the column for which oldest value of the group is to be returned.

# Oldest value of the column 'temperature' of sequenced PTI table.
df_seq_first <- df_seq_grp %>% summarise(first_temp = ts.first(temperature))

# Print the results.
df_seq_first
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                     `GROUP BY TIME(MINU~ buoyid first_temp
#>   <chr>                              <S3: integer64>       <int>      <int>
#> 1 2014-01-06 10:30:00.000000+00:00,~ 35350                    22         23
#> 2 2014-01-06 10:30:00.000000+00:00,~ 35350                    44         43
#> 3 2014-01-06 09:00:00.000000+00:00,~ 35347                     1         70
#> 4 2014-01-06 10:00:00.000000+00:00,~ 35349                    44         43
#> 5 2014-01-06 21:00:00.000000+00:00,~ 35371                     2         80
#> 6 2014-01-06 08:00:00.000000+00:00,~ 35345                     0         10

# Oldest value of the column 'temperature' of non-sequenced PTI table.
df_nonseq_first <- df_nonseq_grp %>% summarise(first_temp = ts.first(temperature))

# Print the results.
df_nonseq_first
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                          `GROUP BY TIME(CAL_YE~ first_temp
#>   <chr>                                   <S3: integer64>             <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-~ 3                              10

# Oldest value of the column 'temperature' of non-PTI table.
df_nonpti_first <- df_nonpti_grp %>% summarise(first_temp = ts.first(temperature))

# Print the results.
df_nonpti_first
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                           `GROUP BY TIME(MINUT~ first_temp
#>   <chr>                                    <S3: integer64>            <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-0~ 23149923                      10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-0~ 23149925                      10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-0~ 23149926                      10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-0~ 23149927                      10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-0~ 23149924                      10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-0~ 23149922                      10
#> # ... with more rows

Notes

  • The data type of the column containing oldest values is same as that of the column for which oldest value of the group is to be returned.
  • NULLs/NAs are not included in the result computation.
  • First is valid only for numeric data.
  • First must be used with group_by_time().

Last : ts.last()

The agggregate function ts.last returns the newest value, determined by timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. Last is a single-threaded function. ts.last uses one argument, named value.expression which holds the column for which newest value of the group is to be returned.

# Newest value of the column 'temperature' of sequenced PTI table.
df_seq_last <- df_seq_grp %>% summarise(last_temp = ts.last(temperature))

# Print the results.
df_seq_last
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                      `GROUP BY TIME(MINU~ buoyid last_temp
#>   <chr>                               <S3: integer64>       <int>     <int>
#> 1 2014-01-06 10:30:00.000000+00:00,2~ 35350                    44        43
#> 2 2014-01-06 08:00:00.000000+00:00,2~ 35345                     0       100
#> 3 2014-01-06 21:00:00.000000+00:00,2~ 35371                     2        82
#> 4 2014-01-06 09:00:00.000000+00:00,2~ 35347                     1        79
#> 5 2014-01-06 10:00:00.000000+00:00,2~ 35349                    44        43
#> 6 2014-01-06 10:30:00.000000+00:00,2~ 35350                    22        23

# Newest value of the column 'temperature' of non-sequenced PTI table.
df_nonseq_last <- df_nonseq_grp %>% summarise(last_temp = ts.last(temperature))

# Print the results.
df_nonseq_last
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                           `GROUP BY TIME(CAL_YE~ last_temp
#>   <chr>                                    <S3: integer64>            <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-0~ 3                             82

# Newest value of the column 'temperature' of non-PTI table.
df_nonpti_last <- df_nonpti_grp %>% summarise(last_temp = ts.last(temperature))

# Print the results.
df_nonpti_last
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                            `GROUP BY TIME(MINUT~ last_temp
#>   <chr>                                     <S3: integer64>           <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01~ 23149923                     10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01~ 23149925                     10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01~ 23149926                     10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01~ 23149927                     10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01~ 23149924                     10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01~ 23149922                     10
#> # ... with more rows

Notes

  • The data type of the column containing newest values is same as that of the column for which newest value of the group is to be returned.
  • NULLs/NAs are not included in the result computation.
  • Last is valid only for numeric data.
  • Last must be used with group_by_time(). When ts.last() is used with group_by but not group_by_time, ts.last() throws different exceptions because LAST is also a function which can operate on Period data types, unlike ts.first() which throws an exception “Time Series Aggregate function invoked without an appropriate GROUP BY TIME clause”. The following are such invalid use cases of ts.last() function:
# Invalid example 1: Using ts.last() on `group_by` with aggregate column not as grouping column.
df1 <- df_seq %>% group_by(buoyid) %>% summarise(temp = ts.last(temperature))

# Note that this will not print the result but throws an exception - "Selected non-aggregate values must be part of the associated group.". This is because the aggregate operation "LAST" (considered here) is the Period data type function.
df1

# Invalid example 2: Using ts.last() on `group_by` with aggregate column as one of the grouping columns.
df2 <- df %>% group_by(buoyid, temperature) %>% summarise(temp = ts.last(temperature))

# Note that this will not print the result but throws an exception - "Invalid argument for the LAST function. The argument must have a Period data type.". This is because the aggregate operation "LAST" (considered here) is the Period data type function.
df2

Median Absolute Deviation : ts.mad()

The agggregate function ts.mad returns the median of the set of values defined as the absolute value of the difference between each value and the median of all values in each group. MAD is a single-threaded function. ts.mad uses two arguments:

  • value.expression which specifies the column from which top values are to be returned.
  • multiplier which specifies a literal numeric constant. If this argument is NULL, default multipler 1.4826 is used for MAD computation.

Formula for computing MAD is as follows:

MAD = b * Mi(|Xi - Mj(Xj)|)

Where,
  b       = A numeric constant. Default value is 1.4826.
  Mj(Xj)  = Median of the original set of values.
  Xi      = The original set of values.
  Mi      = Median of absolute value of the difference between each value in Xi and the Median calculated in Mj(Xj).
    
# MAD of the column 'temperature' of sequenced PTI table.
df_seq_mad <- df_seq_grp %>% summarise(mad_temp = ts.mad(temperature))

# Print the results.
df_seq_mad
#> # Source:   lazy query [?? x 4]
#> # Database: Teradata
#>   TIMECODE_RANGE                       `GROUP BY TIME(MINU~ buoyid mad_temp
#>   <chr>                                <S3: integer64>       <int>    <dbl>
#> 1 2014-01-06 10:30:00.000000+00:00,20~ 35350                    22     0   
#> 2 2014-01-06 10:30:00.000000+00:00,20~ 35350                    44     0   
#> 3 2014-01-06 09:00:00.000000+00:00,20~ 35347                     1     5.19
#> 4 2014-01-06 10:00:00.000000+00:00,20~ 35349                    44     4.45
#> 5 2014-01-06 21:00:00.000000+00:00,20~ 35371                     2     1.48
#> 6 2014-01-06 08:00:00.000000+00:00,20~ 35345                     0    66.0

# MAD of the column 'temperature' of non-sequenced PTI table.
df_nonseq_mad <- df_nonseq_grp %>% summarise(mad_temp = ts.mad(temperature, 2))

# Print the results.
df_nonseq_mad
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                            `GROUP BY TIME(CAL_YE~ mad_temp
#>   <chr>                                     <S3: integer64>           <dbl>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-01~ 3                            27

# MAD of the column 'temperature' of non-PTI table.
df_nonpti_mad <- df_nonpti_grp %>% summarise(mad_temp = ts.mad(temperature, 5))

# Print the results.
df_nonpti_mad
#> # Source:   lazy query [?? x 3]
#> # Database: Teradata
#>   TIMECODE_RANGE                             `GROUP BY TIME(MINUT~ mad_temp
#>   <chr>                                      <S3: integer64>          <dbl>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01-~ 23149923                     0
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01-~ 23149925                     0
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01-~ 23149926                     0
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01-~ 23149927                     0
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01-~ 23149924                     0
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01-~ 23149922                     0
#> # ... with more rows

Notes

  • The data type of the column containing MAD values is REAL data type.
  • NULLs/NAs are not included in the result computation.
  • MAD is valid only for numeric data.
  • MAD must be used with group_by_time().