group_by_time() | Teradata Package for R - group_by_time() - Teradata Package for R

Teradata® Package for R User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
efv1707506846369.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
nqx1518630623256
Product Category
Teradata Vantage

Every time series aggregation must be preceded with grouping by time using the API group_by_time(). For more information on the API, use help(group_by_time) or ?group_by_time in R console.

Although the grouping is optimized for tbl_teradata created for PTI tables, it is also supported on non-PTI tables when the argument timecode.column is specified.

Example Prerequisites

  • Load the example datasets.
    > loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking_nonpti")
  • Create object(s) of class "tbl_teradata".
    > df_seq <- tbl(con, "ocean_buoys_seq")
    > df_nonseq <- tbl(con, "ocean_buoys_nonseq")
    > df_nonpti <- tbl(con, "ocean_buoys_nonpti")
    > df_pack_pti <- tbl(con, "package_tracking_pti")
    > df_pack_nonpti <- tbl(con, "package_tracking_nonpti")

Example 1: Group sequenced PTI tables based on time and column 'buoyid'

This example groups the sequenced PTI tables based on time with timebucket duration of 30 minutes and the column 'buoyid'.

In the example, shorthand notation for timebucket duration is used.

> df_seq_grp <- df_seq %>% group_by_time(timebucket.duration = "30m", value.expression = "buoyid")
> df_seq_grp
# Source:   lazy query [?? x 6]
# Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
#   [TDAPUSER@<hostname>/TDAPUSERDB]
# Groups:   TIMECODE_RANGE, buoyid
   TD_TIMECODE         TD_SEQNO buoyid salinity temperature dates    
   <dttm>                 <int>  <int>    <int>       <int> <date>   
 1 2014-01-06 10:00:25        6     44       55          43 2014-06-06
 2 2014-01-06 10:01:25        8     44       55          53 2014-08-08
 3 2014-01-06 10:01:25       20     44       55          54 2015-08-20
 4 2014-01-06 10:02:25        9     44       55          53 2014-09-09
 5 2014-01-06 10:03:25       10     44       55          53 2014-10-10
 6 2014-01-06 10:03:25       22     44       55          56 2015-10-22
 7 2014-01-06 10:12:00        3     44       55          43 2014-03-03
 8 2014-01-06 09:01:25       11      1       55          70 2014-11-11
 9 2014-01-06 09:01:25       23      1       55          77 2015-11-23
10 2014-01-06 09:02:25       12      1       55          71 2014-12-12
# ... with more rows

Example 2: Group non-PTI tables based on time

This example groups the non-PTI tables based on time with timebucket duration of 1 minute and filling the missing timebuckets with previous values.

In this example, formal notation for timebucket duration and "timecode.column" argument (mandatory for non-PTI table) is used.

> df_nonpti_grp <- df_nonpti %>% group_by_time(timebucket.duration = "MINUTES(1)", timecode.column = "TIMECODE", fill = "PREV")
> df_nonpti_grp
# Source:   lazy query [?? x 4]
# Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
#   [TDAPUSER@<hostname>/TDAPUSERDB]
# Groups:   TIMECODE_RANGE
   TIMECODE            buoyid salinity temperature
   <dttm>               <int>    <int>       <int>
 1 2014-01-06 21:01:25      2       55          80
 2 2014-01-06 10:00:24     44       55          43
 3 2014-01-06 10:00:25     44       55          43
 4 2014-01-06 09:01:25      1       55          77
 5 2014-01-06 08:08:59      0       55          NA
 6 2014-01-06 21:02:25      2       55          81
 7 2014-01-06 09:02:25      1       55          78
 8 2014-01-06 09:02:25      1       55          71
 9 2014-01-06 10:52:00     44       55          43
10 2014-01-06 10:01:25     44       55          53
# ... with more rows