Use delta_t() to calculate time differences, or DELTA_T, between a starting and an ending event.
The calculation is performed against a time-ordered time series data set. The function returns a teradataml DataFrame.
This function is the only Time Series Aggregate function that works with timebucket_duration being set to "*" in groupby_time(), i.e., unbounded time.
When using groupby_time() with unbounded time, the following rules apply to the system virtual columns:
- $TD_GROUP_BY_TIME: Always has a value of 1, since there is only one timebucket;
- $TD_TIMECODE_RANGE: Composed of the first and last timecode values read for the group.
The data being evaluated in the filtering conditions (for example, the minimum and maximum temperature observation) must belong to the timecode value present in the same row of data. This is the expected behavior. However, this assumption can be violated when joining multiple tables together. It is possible to construct a query where the result of a join causes specific data points (for example, a temperature reading) to be present in a data row with a timecode that is not indicative of when that data point occurred. In such a scenario, it is highly likely that the results are not as expected, or are misleading. Vantage does not detect these types of queries, so you must make sure to preserve the correlation between data points and timecodes.
This function returns a column of PERIOD(TIMESTAMP WITH TIME ZONE) type composed of the start and end timecode, i.e., timecode column used for aggregation of each start-end pair. One result is returned per complete start-end pair found within the GROUP BY TIME window. The start-end pair process is as follows:
- If the current source data meets the start condition, the current timecode is saved as the start time.
- If the current source data meets the end condition, and a saved start timecode already exists, the start timecode is saved with the end timecode encountered as a result pair.
Any result of delta_t which has a delta less than 1 microsecond (including a delta of 0, in the case of a result which comes from a single point in time) is automatically rounded to 1 microsecond.
This is strictly enforced to match Period data type semantics in Vantage which dictate that a starting and ending bound of a Period type may not be equivalent. The smallest granularity supported in Vantage is the microsecond, so these results are rounded accordingly.
Examples Prerequisite
Load the example datasets
>>> load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking"])
Example: Find Time Elapsed between Shipping and Receiving an Item
Input data used for this example contains information about parcels sent by a delivery service.
- Case 1: Using DataFrame on PTI Table and showcasing usage of unbounded time in grouping.
- Create DataFrame on PTI table
>>> package_tracking_pti = DataFrame("package_tracking_pti")
>>> package_tracking_pti.columns ['TD_TIMECODE', 'parcelnumber', 'status']
>>> package_tracking_pti TD_TIMECODE status parcelnumber 55 2016-10-15 10:00:00.000000 in transit to destination 75 2016-10-15 16:30:00.000000 in transit to customer 75 2016-10-15 08:00:00.000000 picked up from customer 55 2016-10-15 09:10:00.000000 arrived at receiving station 60 2016-10-15 10:45:00.000000 arrived at receiving station 75 2016-10-15 17:00:00.000000 delivered to customer 59 2016-10-15 08:05:00.000000 picked up from customer 79 2016-10-15 08:05:00.000000 picked up from customer 60 2016-10-15 09:20:00.000000 picked up from customer 75 2016-10-15 16:10:00.000000 arrived at destination station
- Execute groupby_time() using unbounded time for timebucket_duration
>>> gbt = package_tracking_pti.groupby_time("*", "parcelnumber")
- Execute delta_t, with start and end conditions specified as String
>>> start_condition = "status LIKE 'picked%up%customer'"
>>> end_condition = "status LIKE 'delivered%customer'"
>>> gbt.delta_t(start_condition, end_condition) TIMECODE_RANGE parcelnumber delta_t_td_timecode 0 ('2012-01-01 00:00:00.000000+00:00', '9999-12-... 75 ('2016-10-15 08:00:00.000000-00:00', '2016-10-... 1 ('2012-01-01 00:00:00.000000+00:00', '9999-12-... 55 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
- Create DataFrame on PTI table
- Case 2: Using DataFrame on Non-PTI Table and showcasing usage of unbounded time in grouping.
- Create DataFrame on Non-PTI table
>>> package_tracking = DataFrame("package_tracking")
>>> package_tracking.columns ['parcelnumber', 'clock_time', 'status']
>>> package_tracking clock_time status parcelnumber 79 2016-10-15 08:05:00.000000 picked up from customer 75 2016-10-15 09:10:00.000000 arrived at receiving station 75 2016-10-15 10:00:00.000000 in transit to destination 75 2016-10-15 16:10:00.000000 arrived at destination station 75 2016-10-15 17:00:00.000000 delivered to customer 80 2016-10-15 09:20:00.000000 picked up from customer 59 2016-10-15 08:05:00.000000 picked up from customer 75 2016-10-15 16:30:00.000000 in transit to customer 75 2016-10-15 08:00:00.000000 picked up from customer 60 2016-10-15 10:45:00.000000 arrived at receiving station
- Execute groupby_time() using unbounded time for timebucket_duration
>>> gbt = package_tracking.groupby_time("*", "parcelnumber", "clock_time")
- Execute delta_t, with start and end conditions specified as String
>>> start_condition = "status LIKE 'picked%up%customer'"
>>> end_condition = "status LIKE 'delivered%customer'"
>>> gbt.delta_t(start_condition, end_condition) TIMECODE_RANGE parcelnumber delta_t_td_timecode 0 ('1970-01-01 00:00:00.000000+00:00', '9999-12-... 75 ('2016-10-15 08:00:00.000000-00:00', '2016-10-... 1 ('1970-01-01 00:00:00.000000+00:00', '9999-12-... 55 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
- Create DataFrame on Non-PTI table
Example: Search the Minimum and Maximum Observed Temperatures
This example measures the time between minimum and maximum observed temperatures every 30 minutes between 8:00 AM and 10:30 AM on each buoy.
- Case 1: DataFrame on Non-sequenced PTI Table. Specify start condition and end condition as string.
- Create DataFrame
>>> ocean_buoys = DataFrame("ocean_buoys")
- Check DataFrame columns and the head of the data
>>> ocean_buoys.columns ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head() TD_TIMECODE temperature salinity buoyid 0 2014-01-06 08:10:00.000000 100.0 55 0 2014-01-06 08:08:59.999999 NaN 55 1 2014-01-06 09:01:25.122200 77.0 55 1 2014-01-06 09:03:25.122200 79.0 55 1 2014-01-06 09:01:25.122200 70.0 55 1 2014-01-06 09:02:25.122200 71.0 55 1 2014-01-06 09:03:25.122200 72.0 55 0 2014-01-06 08:09:59.999999 99.0 55 0 2014-01-06 08:00:00.000000 10.0 55 0 2014-01-06 08:10:00.000000 10.0 55
- Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
>>> ocean_buoys_dt = ocean_buoys[(ocean_buoys.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys.TD_TIMECODE < '2014-01-06 10:30:00')]
- Get the minimum and maximum temperature within time range of 30 minutes
>>> df_min_max_temp1 = ocean_buoys_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
- Join the dataframe with original 'ocean_buoys'
>>> df2_join1 = ocean_buoys.join(df_min_max_temp1, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
>>> gbt3 = df2_join1.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
- Set the start and end conditions
>>> start_condition = "temperature = min_temperature"
>>> end_condition = "temperature = max_temperature"
>>> gbt3.delta_t(start_condition, end_condition) TIMECODE_RANGE GROUP BY TIME(DAYS(1)) t1_buoyid delta_t_td_timecode 0 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 44 ('2014-01-06 10:00:26.122200-00:00', '2014-01-... 1 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 1 ('2014-01-06 09:01:25.122200-00:00', '2014-01-... 2 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 0 ('2014-01-06 08:00:00.000000-00:00', '2014-01-...
- Create DataFrame
- Case 2: DataFrame on Sequenced PTI Table. Specify start condition and end condition as ColumnExpression.
- Create DataFrame
>>> ocean_buoys_seq = DataFrame("ocean_buoys_seq")
- Check DataFrame columns and the head of the data
>>> ocean_buoys_seq.columns ['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head() TD_TIMECODE TD_SEQNO salinity temperature dates buoyid 0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26 0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18 1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24 1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23 1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12 1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13 1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11 0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19 0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17 0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
- Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
>>> ocean_buoys_seq_dt = ocean_buoys_seq[(ocean_buoys_seq.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys_seq.TD_TIMECODE < '2014-01-06 10:30:00')]
- Get the minimum and maximum temperature within time range of 30 minutes
>>> df_min_max_temp2 = ocean_buoys_seq_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
- Join the dataframe with original 'ocean_buoys'
>>> df2_join2 = ocean_buoys_seq.join(df_min_max_temp2, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
>>> gbt4 = df2_join2.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
- Set the start and end conditions
>>> start_condition = gbt4.temperature == gbt4.min_temperature
>>> end_condition = gbt4.temperature == gbt4.max_temperature
>>> gbt4.delta_t(start_condition, end_condition) TIMECODE_RANGE GROUP BY TIME(DAYS(1)) t1_buoyid delta_t_td_timecode 0 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 44 ('2014-01-06 10:00:26.122200-00:00', '2014-01-... 1 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 1 ('2014-01-06 09:01:25.122200-00:00', '2014-01-... 2 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 0 ('2014-01-06 08:00:00.000000-00:00', '2014-01-...
- Create DataFrame