Time Series Aggregates
Introduction to Time Series Aggregates
The Time Series(TS) aggregate functions help perform aggregate operations on time series data. The goal of this vignette is to provide introductory examples to get the user familiar with the group_by_time API and other TS aggregate functions.
Each time series aggregate function has a section in this vignette. Each section can be read in isolation, except where references are made to other sections or other vignettes. Each section also has a subsection called “Things to Note” for good practices and specific behavior to consider when using the aggregate functions with Teradata Vantage.
Once the connection is established using td_create_context, you can load the example tables ocean_buoys_seq
, ocean_buoys_nonseq
and ocean_buoys_nonpti
and create respective tibbles using the following statements:
loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq", "ocean_buoys_nonpti")
# Create tbl_teradata objects.
df_seq <- tbl(con, "ocean_buoys_seq")
df_nonseq <- tbl(con, "ocean_buoys_nonseq")
df_nonpti <- tbl(con, "ocean_buoys_nonpti")
group_by_time API
Every time series aggregation must be preceded with grouping by time using the API group_by_time
. For more information on the API, one can use help(group_by_time)
or ?group_by_time
in R console. Here, we provide different usages of group_by_time
.
# Grouping the sequenced PTI tables based on time with timebucket duration of 30 minutes and the column 'buoyid'. Note the use of shorthand notation for timebucket duration.
df_seq_grp <- df_seq %>%
group_by_time(timebucket.duration = "30m", value.expression = "buoyid")
# Grouping the non-PTI tables based on time with timebucket duration of 1 minute and filling the missing timebuckets with previous values. Note the use of formal notation for timebucket duration and timecode.column argument (mandatory for non-PTI table).
df_nonpti_grp <- df_nonpti %>%
group_by_time(timebucket.duration = "MINUTES(1)", timecode.column = "TIMECODE", fill = "PREV")
# Grouping the non-PTI tables based on time with timebucket duration of 1 Calendar Year with fill = 10000 (some numeric constant)
df_nonseq_grp <- df_nonseq %>%
group_by_time(timebucket.duration = "CAL_YEARS(1)", fill = 10000)
Aggregate Functions
Mode : ts.mode()
The agggregate function ts.mode
returns the mode of all values in each group. In the event of a tie between two or more values, a row per result is returned. Mode is a single-threaded function. ts.mode
uses one argument, named value.expression
which holds the column for which mode is to be computed.
# Mode of the column 'temperature' of sequenced PTI table.
df_seq_mode <- df_seq_grp %>% summarise(mode_temp = ts.mode(temperature))
# Print the results.
df_seq_mode
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid mode_temp
#> <chr> <S3: integer64> <int> <int>
#> 1 2014-01-06 09:00:00.000000+00:00,2~ 35347 1 70
#> 2 2014-01-06 08:00:00.000000+00:00,2~ 35345 0 10
#> 3 2014-01-06 21:00:00.000000+00:00,2~ 35371 2 80
#> 4 2014-01-06 21:00:00.000000+00:00,2~ 35371 2 81
#> 5 2014-01-06 09:00:00.000000+00:00,2~ 35347 1 71
#> 6 2014-01-06 10:00:00.000000+00:00,2~ 35349 44 43
#> # ... with more rows
# Mode of the column 'temperature' of non-sequenced PTI table.
df_nonseq_mode <- df_nonseq_grp %>% summarise(mode_temp = ts.mode(temperature))
# Print the results.
df_nonseq_mode
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ mode_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-0~ 3 43
# Mode of the column 'temperature' of non-PTI table.
df_nonpti_mode <- df_nonpti_grp %>% summarise(mode_temp = ts.mode(temperature))
# Print the results.
df_nonpti_mode
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ mode_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing mode values is same as that of the column for which mode is to be computed.
- NULLs/NAs are not included in the result computation.
- Mode is valid only for numeric data.
- Mode must be used with
group_by_time()
.
Bottom : ts.bottom()
The agggregate function ts.bottom
returns the smallest number.of.values in the value.expression for each group, with or without ties. Bottom is a single-threaded function.ts.bottom
uses three arguments:
value.expression
which specifies the column from which bottom values are to be returned.number.of.values
which specifies the integer value representing the number of values to return.with.ties
which specifies a flag to decide whether to run bottom function with ties or not. Default value is FALSE.
# Get the smallest 2 values of the column 'temperature' for each group without ties of sequenced PTI table.
df_seq_bottom <- df_seq_grp %>% summarise(bottom_temp = ts.bottom(temperature, 2))
# Print the results.
df_seq_bottom
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid bottom_temp
#> <chr> <S3: integer64> <int> <int>
#> 1 2014-01-06 09:00:00.000000+00:00~ 35347 1 70
#> 2 2014-01-06 08:00:00.000000+00:00~ 35345 0 10
#> 3 2014-01-06 08:00:00.000000+00:00~ 35345 0 10
#> 4 2014-01-06 21:00:00.000000+00:00~ 35371 2 80
#> 5 2014-01-06 21:00:00.000000+00:00~ 35371 2 81
#> 6 2014-01-06 09:00:00.000000+00:00~ 35347 1 71
#> # ... with more rows
# Get the smallest 4 values of the column 'temperature' for each group with ties of non-sequenced PTI table.
df_nonseq_bottom <- df_nonseq_grp %>% summarise(bottom_temp = ts.bottom(temperature, 4, TRUE))
# Print the results.
df_nonseq_bottom
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ bottom_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015~ 3 43
#> 2 2014-01-01 00:00:00.000000+00:00,2015~ 3 43
#> 3 2014-01-01 00:00:00.000000+00:00,2015~ 3 43
#> 4 2014-01-01 00:00:00.000000+00:00,2015~ 3 43
#> 5 2014-01-01 00:00:00.000000+00:00,2015~ 3 43
#> 6 2014-01-01 00:00:00.000000+00:00,2015~ 3 10
#> # ... with more rows
# Get the smallest 2 values of the column 'temperature' for each group with ties of non-PTI table.
df_nonpti_bottom <- df_nonpti_grp %>% summarise(bottom_temp = ts.bottom(temperature, 2, TRUE))
# Print the results.
df_nonpti_bottom
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ bottom_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing bottom values is same as that of the column for which bottom is to be computed.
- NULLs/NAs are not included in the result computation.
- Bottom is valid only for numeric data.
- Bottom must be used with
group_by_time()
. - Bottom with ties implies that the rows returned include the specified number of rows in the ordered set for each timebucket. It includes any rows where the sort key value is the same as the sort key value in the last row that satisfies the specified number or percentage of rows. If this clause is omitted and ties are found, the earliest value in terms of timecode is returned.
Top : ts.top()
The agggregate function ts.top
returns the largest number.of.values in the value.expression for each group, with or without ties. Top is a single-threaded function.ts.top
uses three arguments:
value.expression
which specifies the column from which top values are to be returned.number.of.values
which specifies the integer value representing the number of values to return.with.ties
which specifies a flag to decide whether to run top function with ties or not. Default value is FALSE.
# Get the largest 2 values of the column 'temperature' for each group without ties of sequenced PTI table.
df_seq_top <- df_seq_grp %>% summarise(top_temp = ts.top(temperature, 2))
# Print the results.
df_seq_top
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid top_temp
#> <chr> <S3: integer64> <int> <int>
#> 1 2014-01-06 09:00:00.000000+00:00,20~ 35347 1 78
#> 2 2014-01-06 08:00:00.000000+00:00,20~ 35345 0 99
#> 3 2014-01-06 08:00:00.000000+00:00,20~ 35345 0 100
#> 4 2014-01-06 21:00:00.000000+00:00,20~ 35371 2 81
#> 5 2014-01-06 21:00:00.000000+00:00,20~ 35371 2 82
#> 6 2014-01-06 09:00:00.000000+00:00,20~ 35347 1 79
#> # ... with more rows
# Get the largest 4 values of the column 'temperature' for each group with ties of non-sequenced PTI table.
df_nonseq_top <- df_nonseq_grp %>% summarise(top_temp = ts.top(temperature, 4, TRUE))
# Print the results.
df_nonseq_top
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ top_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-01~ 3 99
#> 2 2014-01-01 00:00:00.000000+00:00,2015-01~ 3 100
#> 3 2014-01-01 00:00:00.000000+00:00,2015-01~ 3 82
#> 4 2014-01-01 00:00:00.000000+00:00,2015-01~ 3 81
# Get the largest 2 values of the column 'temperature' for each group with ties of non-PTI table.
df_nonpti_top <- df_nonpti_grp %>% summarise(top_temp = ts.top(temperature, 2, TRUE))
# Print the results.
df_nonpti_top
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ top_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01-~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01-~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01-~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01-~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01-~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01-~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing top values is same as that of the column for which top is to be computed.
- NULLs/NAs are not included in the result computation.
- Top is valid only for numeric data.
- Top must be used with
group_by_time()
. - Top with ties implies that the rows returned include the specified number of rows in the ordered set for each timebucket. It includes any rows where the sort key value is the same as the sort key value in the last row that satisfies the specified number or percentage of rows. If this clause is omitted and ties are found, the earliest value in terms of timecode is returned.
Median : ts.median()
The agggregate function ts.median
returns the median of all values in each group. The function returns the average of the two middle values if the argument value.expression contains an even number of values. Median is a single-threaded function. ts.median
uses two arguments:
value.expression
specifies the column for which median is to be computed.use.distinct
specifies whether to exclude duplicates specified by value.expression from the computation. Default value is FALSE. By default, all non-null values in the column specified in value.expression, including duplicates, are considered in the computation. Ifuse.distinct = TRUE
, then all duplicate values are excluded in the computation.
# Median of the column 'temperature' of sequenced PTI table.
df_seq_mode <- df_seq_grp %>% summarise(median_temp = ts.median(temperature))
# Print the results.
df_seq_mode
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid median_temp
#> <chr> <S3: integer64> <int> <dbl>
#> 1 2014-01-06 10:30:00.000000+00:00~ 35350 22 23
#> 2 2014-01-06 10:30:00.000000+00:00~ 35350 44 43
#> 3 2014-01-06 09:00:00.000000+00:00~ 35347 1 74.5
#> 4 2014-01-06 10:00:00.000000+00:00~ 35349 44 53
#> 5 2014-01-06 21:00:00.000000+00:00~ 35371 2 81
#> 6 2014-01-06 08:00:00.000000+00:00~ 35345 0 54.5
# Median of the column 'temperature' of non-sequenced PTI table, excluding the duplicates in the computation.
df_nonseq_mode <- df_nonseq_grp %>% summarise(median_temp = ts.median(temperature, TRUE))
# Print the results.
df_nonseq_mode
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ median_temp
#> <chr> <S3: integer64> <dbl>
#> 1 2014-01-01 00:00:00.000000+00:00,2015~ 3 72
# Median of the column 'temperature' of non-PTI table.
df_nonpti_mode <- df_nonpti_grp %>% summarise(median_temp = ts.median(temperature))
# Print the results.
df_nonpti_mode
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ median_temp
#> <chr> <S3: integer64> <dbl>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing median values is REAL data type.
- NULLs/NAs are not included in the result computation.
- Median is valid only for numeric data.
- The use of
use.distinct = TRUE
is valid only when the functionts.median
is used withgroup_by_time
. The functionts.median
can be used with regulargroup_by
only when the argumentuse.distinct
takesFALSE
. Otherwise, an error is thrown. - We recommend to use function
median(value.expression)
for regulargroup_by
operation instead of the functionts.median
as shown in the example 1 below. - The functions
ts.median
andmedian
can also be used with onlysummarise
and withoutgroup_by
as shown in the example 2 below. We recommend to use the functionts.median
only with time series aggregate operation. - The functions
ts.median
andmedian
cannot be used with dplyr verbs likemutate
,filter
as shown in the example 3 below.
# Example 1
# Note the use of median() instead of ts.median() for regular group_by operation.
df_median <- df_seq %>% group_by(buoyid) %>% summarise(median_temp = median(temperature))
# Print the results.
df_median
#> # Source: lazy query [?? x 2]
#> # Database: Teradata
#> buoyid median_temp
#> <int> <int>
#> 1 44 43
#> 2 2 81
#> 3 1 74
#> 4 22 23
#> 5 0 54
# Example 2
# 2. Using median() without grouping.
df_median <- df_seq %>% summarise(median_temp = median(temperature))
# Print the results.
df_median
#> # Source: lazy query [?? x 1]
#> # Database: Teradata
#> median_temp
#> <int>
#> 1 54
# Example 3
# Incorrect usage of median() with mutate and filter. The same is the case with ts.median().
df_median <- df_seq %>% mutate(median_t = median(temperature))
df_median <- df_seq %>% filter(temperature <= median(temperature))
First : ts.first()
The agggregate function ts.first
returns the oldest value, determined by timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. First is a single-threaded function. ts.first
uses one argument, named value.expression
which holds the column for which oldest value of the group is to be returned.
# Oldest value of the column 'temperature' of sequenced PTI table.
df_seq_first <- df_seq_grp %>% summarise(first_temp = ts.first(temperature))
# Print the results.
df_seq_first
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid first_temp
#> <chr> <S3: integer64> <int> <int>
#> 1 2014-01-06 10:30:00.000000+00:00,~ 35350 22 23
#> 2 2014-01-06 10:30:00.000000+00:00,~ 35350 44 43
#> 3 2014-01-06 09:00:00.000000+00:00,~ 35347 1 70
#> 4 2014-01-06 10:00:00.000000+00:00,~ 35349 44 43
#> 5 2014-01-06 21:00:00.000000+00:00,~ 35371 2 80
#> 6 2014-01-06 08:00:00.000000+00:00,~ 35345 0 10
# Oldest value of the column 'temperature' of non-sequenced PTI table.
df_nonseq_first <- df_nonseq_grp %>% summarise(first_temp = ts.first(temperature))
# Print the results.
df_nonseq_first
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ first_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-~ 3 10
# Oldest value of the column 'temperature' of non-PTI table.
df_nonpti_first <- df_nonpti_grp %>% summarise(first_temp = ts.first(temperature))
# Print the results.
df_nonpti_first
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ first_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-0~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-0~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-0~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-0~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-0~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-0~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing oldest values is same as that of the column for which oldest value of the group is to be returned.
- NULLs/NAs are not included in the result computation.
- First is valid only for numeric data.
- First must be used with
group_by_time()
.
Last : ts.last()
The agggregate function ts.last
returns the newest value, determined by timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. Last is a single-threaded function. ts.last
uses one argument, named value.expression
which holds the column for which newest value of the group is to be returned.
# Newest value of the column 'temperature' of sequenced PTI table.
df_seq_last <- df_seq_grp %>% summarise(last_temp = ts.last(temperature))
# Print the results.
df_seq_last
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid last_temp
#> <chr> <S3: integer64> <int> <int>
#> 1 2014-01-06 10:30:00.000000+00:00,2~ 35350 44 43
#> 2 2014-01-06 08:00:00.000000+00:00,2~ 35345 0 100
#> 3 2014-01-06 21:00:00.000000+00:00,2~ 35371 2 82
#> 4 2014-01-06 09:00:00.000000+00:00,2~ 35347 1 79
#> 5 2014-01-06 10:00:00.000000+00:00,2~ 35349 44 43
#> 6 2014-01-06 10:30:00.000000+00:00,2~ 35350 22 23
# Newest value of the column 'temperature' of non-sequenced PTI table.
df_nonseq_last <- df_nonseq_grp %>% summarise(last_temp = ts.last(temperature))
# Print the results.
df_nonseq_last
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ last_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-0~ 3 82
# Newest value of the column 'temperature' of non-PTI table.
df_nonpti_last <- df_nonpti_grp %>% summarise(last_temp = ts.last(temperature))
# Print the results.
df_nonpti_last
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ last_temp
#> <chr> <S3: integer64> <int>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01~ 23149923 10
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01~ 23149925 10
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01~ 23149926 10
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01~ 23149927 10
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01~ 23149924 10
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01~ 23149922 10
#> # ... with more rows
Notes
- The data type of the column containing newest values is same as that of the column for which newest value of the group is to be returned.
- NULLs/NAs are not included in the result computation.
- Last is valid only for numeric data.
- Last must be used with
group_by_time()
. Whents.last()
is used withgroup_by
but notgroup_by_time
,ts.last()
throws different exceptions because LAST is also a function which can operate on Period data types, unlikets.first()
which throws an exception “Time Series Aggregate function invoked without an appropriate GROUP BY TIME clause”. The following are such invalid use cases ofts.last()
function:
# Invalid example 1: Using ts.last() on `group_by` with aggregate column not as grouping column.
df1 <- df_seq %>% group_by(buoyid) %>% summarise(temp = ts.last(temperature))
# Note that this will not print the result but throws an exception - "Selected non-aggregate values must be part of the associated group.". This is because the aggregate operation "LAST" (considered here) is the Period data type function.
df1
# Invalid example 2: Using ts.last() on `group_by` with aggregate column as one of the grouping columns.
df2 <- df %>% group_by(buoyid, temperature) %>% summarise(temp = ts.last(temperature))
# Note that this will not print the result but throws an exception - "Invalid argument for the LAST function. The argument must have a Period data type.". This is because the aggregate operation "LAST" (considered here) is the Period data type function.
df2
Median Absolute Deviation : ts.mad()
The agggregate function ts.mad
returns the median of the set of values defined as the absolute value of the difference between each value and the median of all values in each group. MAD is a single-threaded function. ts.mad
uses two arguments:
value.expression
which specifies the column from which top values are to be returned.multiplier
which specifies a literal numeric constant. If this argument is NULL, default multipler 1.4826 is used for MAD computation.
Formula for computing MAD is as follows:
MAD = b * Mi(|Xi - Mj(Xj)|)
Where,
b = A numeric constant. Default value is 1.4826.
Mj(Xj) = Median of the original set of values.
Xi = The original set of values.
Mi = Median of absolute value of the difference between each value in Xi and the Median calculated in Mj(Xj).
# MAD of the column 'temperature' of sequenced PTI table.
df_seq_mad <- df_seq_grp %>% summarise(mad_temp = ts.mad(temperature))
# Print the results.
df_seq_mad
#> # Source: lazy query [?? x 4]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINU~ buoyid mad_temp
#> <chr> <S3: integer64> <int> <dbl>
#> 1 2014-01-06 10:30:00.000000+00:00,20~ 35350 22 0
#> 2 2014-01-06 10:30:00.000000+00:00,20~ 35350 44 0
#> 3 2014-01-06 09:00:00.000000+00:00,20~ 35347 1 5.19
#> 4 2014-01-06 10:00:00.000000+00:00,20~ 35349 44 4.45
#> 5 2014-01-06 21:00:00.000000+00:00,20~ 35371 2 1.48
#> 6 2014-01-06 08:00:00.000000+00:00,20~ 35345 0 66.0
# MAD of the column 'temperature' of non-sequenced PTI table.
df_nonseq_mad <- df_nonseq_grp %>% summarise(mad_temp = ts.mad(temperature, 2))
# Print the results.
df_nonseq_mad
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(CAL_YE~ mad_temp
#> <chr> <S3: integer64> <dbl>
#> 1 2014-01-01 00:00:00.000000+00:00,2015-01~ 3 27
# MAD of the column 'temperature' of non-PTI table.
df_nonpti_mad <- df_nonpti_grp %>% summarise(mad_temp = ts.mad(temperature, 5))
# Print the results.
df_nonpti_mad
#> # Source: lazy query [?? x 3]
#> # Database: Teradata
#> TIMECODE_RANGE `GROUP BY TIME(MINUT~ mad_temp
#> <chr> <S3: integer64> <dbl>
#> 1 2014-01-06 08:02:00.000000+00:00,2014-01-~ 23149923 0
#> 2 2014-01-06 08:04:00.000000+00:00,2014-01-~ 23149925 0
#> 3 2014-01-06 08:05:00.000000+00:00,2014-01-~ 23149926 0
#> 4 2014-01-06 08:06:00.000000+00:00,2014-01-~ 23149927 0
#> 5 2014-01-06 08:03:00.000000+00:00,2014-01-~ 23149924 0
#> 6 2014-01-06 08:01:00.000000+00:00,2014-01-~ 23149922 0
#> # ... with more rows
Notes
- The data type of the column containing MAD values is REAL data type.
- NULLs/NAs are not included in the result computation.
- MAD is valid only for numeric data.
- MAD must be used with
group_by_time()
.