Every time series aggregation must be preceded with grouping by time using the API group_by_time(). For more information on the API, use help(group_by_time) or ?group_by_time in R console.
Although the grouping is optimized for tbl_teradata created for PTI tables, it is also supported on non-PTI tables when the argument timecode.column is specified.
Example Prerequisites
- Load the example datasets.
> loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking_nonpti")
- Create object(s) of class "tbl_teradata".
> df_seq <- tbl(con, "ocean_buoys_seq")
> df_nonseq <- tbl(con, "ocean_buoys_nonseq")
> df_nonpti <- tbl(con, "ocean_buoys_nonpti")
> df_pack_pti <- tbl(con, "package_tracking_pti")
> df_pack_nonpti <- tbl(con, "package_tracking_nonpti")
Example 1: Group sequenced PTI tables based on time and column 'buoyid'
This example groups the sequenced PTI tables based on time with timebucket duration of 30 minutes and the column 'buoyid'.
In the example, shorthand notation for timebucket duration is used.
> df_seq_grp <- df_seq %>% group_by_time(timebucket.duration = "30m", value.expression = "buoyid")
> df_seq_grp # Source: lazy query [?? x 6] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Groups: TIMECODE_RANGE, buoyid TD_TIMECODE TD_SEQNO buoyid salinity temperature dates <dttm> <int> <int> <int> <int> <date> 1 2014-01-06 10:00:25 6 44 55 43 2014-06-06 2 2014-01-06 10:01:25 8 44 55 53 2014-08-08 3 2014-01-06 10:01:25 20 44 55 54 2015-08-20 4 2014-01-06 10:02:25 9 44 55 53 2014-09-09 5 2014-01-06 10:03:25 10 44 55 53 2014-10-10 6 2014-01-06 10:03:25 22 44 55 56 2015-10-22 7 2014-01-06 10:12:00 3 44 55 43 2014-03-03 8 2014-01-06 09:01:25 11 1 55 70 2014-11-11 9 2014-01-06 09:01:25 23 1 55 77 2015-11-23 10 2014-01-06 09:02:25 12 1 55 71 2014-12-12 # ... with more rows
Example 2: Group non-PTI tables based on time
This example groups the non-PTI tables based on time with timebucket duration of 1 minute and filling the missing timebuckets with previous values.
In this example, formal notation for timebucket duration and "timecode.column" argument (mandatory for non-PTI table) is used.
> df_nonpti_grp <- df_nonpti %>% group_by_time(timebucket.duration = "MINUTES(1)", timecode.column = "TIMECODE", fill = "PREV")
> df_nonpti_grp # Source: lazy query [?? x 4] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Groups: TIMECODE_RANGE TIMECODE buoyid salinity temperature <dttm> <int> <int> <int> 1 2014-01-06 21:01:25 2 55 80 2 2014-01-06 10:00:24 44 55 43 3 2014-01-06 10:00:25 44 55 43 4 2014-01-06 09:01:25 1 55 77 5 2014-01-06 08:08:59 0 55 NA 6 2014-01-06 21:02:25 2 55 81 7 2014-01-06 09:02:25 1 55 78 8 2014-01-06 09:02:25 1 55 71 9 2014-01-06 10:52:00 44 55 43 10 2014-01-06 10:01:25 44 55 53 # ... with more rows