Description
This function takes an existing tbl_teradata and converts into a grouped tbl_teradata
object that allows a set of aggregate functions to be computed on data grouped in terms of time.
Although the grouping is optimized for objects of class "tbl_teradata" created using PTI tables,
it is also supported on objects of class "tbl_teradata" created using non-PTI tables when the
argument "timecode.column" is specified.
Note: If consecutive groupings are performed, latest grouping operation is considered, i.e. if
groupby
is followed by groupby_time
, groupby_time
grouping is considered
for aggregation. Similarly, if groupby_time
is followed by groupby
, groupby
grouping is considered for aggregation. If consecutive groupby_time
operations are
performed, latest groupby_time
is considered for aggregation.
Usage
group_by_time(
df,
timebucket.duration,
value.expression = c(),
timecode.column = NULL,
sequence.column = NULL,
fill = NULL,
...
)
Arguments
df |
Required Argument. | ||||||||||||||||||||||||||||||||||||||||||
timebucket.duration |
Required Argument.
Where, N is a 16-bit positive integer with a maximum value of
32767.
| ||||||||||||||||||||||||||||||||||||||||||
value.expression |
Optional Argument.
Default Value : c() | ||||||||||||||||||||||||||||||||||||||||||
timecode.column |
Optional Argument. | ||||||||||||||||||||||||||||||||||||||||||
sequence.column |
Optional Argument. | ||||||||||||||||||||||||||||||||||||||||||
fill |
Optional Argument.
Default Value : NULL | ||||||||||||||||||||||||||||||||||||||||||
... |
Optional arguments for future purpose, if needed. |
Value
A 'tbl_teradata' object.
Examples
# Get remote data source connection.
con <- td_get_context()$connection
# Load the required tables.
loadExampleData("time_series_example", "ocean_buoys_seq", "ocean_buoys_nonseq",
"ocean_buoys_nonpti", "package_tracking_pti")
# Create objects of class "tbl_teradata".
df_seq <- tbl(con, "ocean_buoys_seq") # tbl of sequenced PTI table.
df_nonseq <- tbl(con, "ocean_buoys_nonseq") # tbl of non-sequenced PTI table.
df_nonpti <- tbl(con, "ocean_buoys_nonpti") # tbl of non-PTI table.
# Example 1: Grouping the sequenced PTI tables based on time with timebucket duration of
# 30 minutes and the column 'buoyid' and ignoring the aggregation when there is
# no values in the group.
# Grouping by timebucket duration of 30 minutes and 'buoyid'.
# Note the use of shorthand notation for timebucket duration.
seq_group1 <- df_seq %>% group_by_time(timebucket.duration = "30m",
value.expression = "buoyid")
# Applying min() aggregation on grouped tbl object.
seq_group1 %>% summarise(min_temp = min(temperature))
# Example 2: Grouping the non-PTI tables based on time with timebucket duration of 1 minute
# and fills the missing timebuckets with previous values.
# Grouping by timebucket duration of 1 minute.
# Note the use of formal notation for timebucket duration.
# Note the use of timecode.column argument (mandatory for non-PTI table).
seq_group2 <- df_nonpti %>% group_by_time(timebucket.duration = "MINUTES(1)",
timecode.column = "TIMECODE",
fill = "PREV")
# Applying min() aggregation on grouped tbl object.
temp_grouped <- seq_group2 %>% summarise(min_temp = min(temperature))
# Displaying the rows in the increasing order to TIMECODE_RANGE or 'GROUP BY TIME(MINUTES(1))'
# column.
temp_grouped %>% arrange(`GROUP BY TIME(MINUTES(1))`)
# Example 3: Grouping the non-sequenced PTI tables based on time with timebucket duration of
# 2 Calendar years. Note the use of timecode.column and sequence.column arguments.
# The next three grouping give a quick demo of the use of fill argument keeping all other
# arguments same.
# With fill = "PREVIOUS"
grp1 <- df_nonseq %>% group_by_time(timebucket.duration = "CAL_YEARS(1)",
timecode.column = "TD_TIMECODE",
sequence.column = "buoyid",
fill = "PREVIOUS")
min_t <- grp1 %>% summarise(min_temp = min(temperature))
min_t %>% arrange(`GROUP BY TIME(CAL_YEARS(1))`)
# With fill = "NEXT"
grp1 <- df_nonseq %>% group_by_time(timebucket.duration = "CAL_YEARS(1)",
timecode.column = "TD_TIMECODE",
sequence.column = "buoyid",
fill = "NEXT")
min_t <- grp1 %>% summarise(min_temp = min(temperature))
min_t %>% arrange(`GROUP BY TIME(CAL_YEARS(1))`)
# With fill = 10000 (some numeric constant)
grp1 <- df_nonseq %>% group_by_time(timebucket.duration = "CAL_YEARS(1)",
timecode.column = "TD_TIMECODE",
sequence.column = "buoyid",
fill = 10000)
min_t <- grp1 %>% summarise(min_temp = min(temperature))
min_t %>% arrange(`GROUP BY TIME(CAL_YEARS(1))`)
# Example 4: Finding Time Elapsed between Shipping and Receiving an Item. Input data used for
# this example contains information about parcels sent by a delivery service. This
# example also demonstrates the use of unbounded timebucket duration ('*') for
# 'ts.delta_t' time series aggregate function.
df_pack_pti <- tbl(con, "package_tracking_pti")
df_grp <- df_pack_pti %>% group_by_time(timebucket.duration = "*",
value.expression = "parcel_number")
df_out <- df_grp %>% summarise(delta_t = ts.delta_t("Status LIKE 'picked%up%customer'",
"Status LIKE 'delivered%customer'"))
df_out %>% arrange(TIMECODE_RANGE, parcel_number)