group_by_time() | Teradata R Package - 17.00 - group_by_time() - Teradata R Package

Teradata® R Package User Guide

prodname
Teradata R Package
vrm_release
17.00
created_date
November 2020
category
User Guide
featnum
B700-4005-090K

Every time series aggregation must be preceded with grouping by time using the API group_by_time(). For more information on the API, use help(group_by_time) or ?group_by_time in R console.

Although the grouping is optimized for tbl_teradata created for PTI tables, it is also supported on non-PTI tables when the argument "timecode.column" is specified.

Example Prerequisites

  • Load the example datasets.
    > loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking_nonpti")
  • Create object(s) of class "tbl_teradata".
    > df_seq <- tbl(con, "ocean_buoys_seq")
    
    > df_nonseq <- tbl(con, "ocean_buoys_nonseq")
    
    > df_nonpti <- tbl(con, "ocean_buoys_nonpti")
    
    > df_pack_pti <- tbl(con, "package_tracking_pti")
    
    > df_pack_nonpti <- tbl(con, "package_tracking_nonpti")

Example 1: Group sequenced PTI tables based on time and column 'buoyid'

This example groups the sequenced PTI tables based on time with timebucket duration of 30 minutes and the column 'buoyid'.

In the example, shorthand notation for timebucket duration is used.

> df_seq_grp <- df_seq %>% group_by_time(timebucket.duration = "30m", value.expression = "buoyid")
> df_seq_grp
# Source:   lazy query [?? x 6]
# Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
#   [TDAPUSER@<hostname>/TDAPUSERDB]
# Groups:   TIMECODE_RANGE, buoyid
   TD_TIMECODE         TD_SEQNO buoyid salinity temperature dates    
   <dttm>                 <int>  <int>    <int>       <int> <date>   
 1 2014-01-06 10:00:25        6     44       55          43 2014-06-06
 2 2014-01-06 10:01:25        8     44       55          53 2014-08-08
 3 2014-01-06 10:01:25       20     44       55          54 2015-08-20
 4 2014-01-06 10:02:25        9     44       55          53 2014-09-09
 5 2014-01-06 10:03:25       10     44       55          53 2014-10-10
 6 2014-01-06 10:03:25       22     44       55          56 2015-10-22
 7 2014-01-06 10:12:00        3     44       55          43 2014-03-03
 8 2014-01-06 09:01:25       11      1       55          70 2014-11-11
 9 2014-01-06 09:01:25       23      1       55          77 2015-11-23
10 2014-01-06 09:02:25       12      1       55          71 2014-12-12
# ... with more rows

Example 2: Group non-PTI tables based on time

This example groups the non-PTI tables based on time with timebucket duration of 1 minute and filling the missing timebuckets with previous values.

In this example, formal notation for timebucket duration and "timecode.column" argument (mandatory for non-PTI table) is used.

> df_nonpti_grp <- df_nonpti %>% group_by_time(timebucket.duration = "MINUTES(1)", timecode.column = "TIMECODE", fill = "PREV")
> df_nonpti_grp
# Source:   lazy query [?? x 4]
# Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
#   [TDAPUSER@<hostname>/TDAPUSERDB]
# Groups:   TIMECODE_RANGE
   TIMECODE            buoyid salinity temperature
   <dttm>               <int>    <int>       <int>
 1 2014-01-06 21:01:25      2       55          80
 2 2014-01-06 10:00:24     44       55          43
 3 2014-01-06 10:00:25     44       55          43
 4 2014-01-06 09:01:25      1       55          77
 5 2014-01-06 08:08:59      0       55          NA
 6 2014-01-06 21:02:25      2       55          81
 7 2014-01-06 09:02:25      1       55          78
 8 2014-01-06 09:02:25      1       55          71
 9 2014-01-06 10:52:00     44       55          43
10 2014-01-06 10:01:25     44       55          53
# ... with more rows