delta_t() - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

Use delta_t() to calculate time differences, or DELTA_T, between a starting and an ending event.

The calculation is performed against a time-ordered time series data set. The function returns a teradataml DataFrame.

This function is the only Time Series Aggregate function that works with timebucket_duration being set to "*" in groupby_time(), i.e., unbounded time.
When using groupby_time() with unbounded time, the following rules apply to the system virtual columns:
  • $TD_GROUP_BY_TIME: Always has a value of 1, since there is only one timebucket;
  • $TD_TIMECODE_RANGE: Composed of the first and last timecode values read for the group.
The data being evaluated in the filtering conditions (for example, the minimum and maximum temperature observation) must belong to the timecode value present in the same row of data. This is the expected behavior. However, this assumption can be violated when joining multiple tables together. It is possible to construct a query where the result of a join causes specific data points (for example, a temperature reading) to be present in a data row with a timecode that is not indicative of when that data point occurred. In such a scenario, it is highly likely that the results are not as expected, or are misleading. Vantage does not detect these types of queries, so you must make sure to preserve the correlation between data points and timecodes.
This function returns a column of PERIOD(TIMESTAMP WITH TIME ZONE) type composed of the start and end timecode, i.e., timecode column used for aggregation of each start-end pair. One result is returned per complete start-end pair found within the GROUP BY TIME window. The start-end pair process is as follows:
  • If the current source data meets the start condition, the current timecode is saved as the start time.
  • If the current source data meets the end condition, and a saved start timecode already exists, the start timecode is saved with the end timecode encountered as a result pair.
The processing algorithm implies that multiple results may be found in each group. If no start-end pair is encountered, no result row is returned.
Any result of delta_t which has a delta less than 1 microsecond (including a delta of 0, in the case of a result which comes from a single point in time) is automatically rounded to 1 microsecond.

This is strictly enforced to match Period data type semantics in Vantage which dictate that a starting and ending bound of a Period type may not be equivalent. The smallest granularity supported in Vantage is the microsecond, so these results are rounded accordingly.

Examples Prerequisite

Load the example datasets

>>> load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking"])

Example: Find Time Elapsed between Shipping and Receiving an Item

Input data used for this example contains information about parcels sent by a delivery service.

  • Case 1: Using DataFrame on PTI Table and showcasing usage of unbounded time in grouping.
    • Create DataFrame on PTI table
      >>> package_tracking_pti = DataFrame("package_tracking_pti")
      
      >>> package_tracking_pti.columns
      ['TD_TIMECODE', 'parcelnumber', 'status']
      
      >>> package_tracking_pti
                                   TD_TIMECODE                          status
      parcelnumber
      55            2016-10-15 10:00:00.000000       in transit to destination
      75            2016-10-15 16:30:00.000000          in transit to customer
      75            2016-10-15 08:00:00.000000         picked up from customer
      55            2016-10-15 09:10:00.000000    arrived at receiving station
      60            2016-10-15 10:45:00.000000    arrived at receiving station
      75            2016-10-15 17:00:00.000000           delivered to customer
      59            2016-10-15 08:05:00.000000         picked up from customer
      79            2016-10-15 08:05:00.000000         picked up from customer
      60            2016-10-15 09:20:00.000000         picked up from customer
      75            2016-10-15 16:10:00.000000  arrived at destination station
      
    • Execute groupby_time() using unbounded time for timebucket_duration
      >>> gbt = package_tracking_pti.groupby_time("*", "parcelnumber")
      
    • Execute delta_t, with start and end conditions specified as String
      >>> start_condition = "status LIKE 'picked%up%customer'"
      
      >>> end_condition = "status LIKE 'delivered%customer'"
      
      >>> gbt.delta_t(start_condition, end_condition)
                                            TIMECODE_RANGE  parcelnumber                                delta_t_td_timecode
      0  ('2012-01-01 00:00:00.000000+00:00', '9999-12-...            75  ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
      1  ('2012-01-01 00:00:00.000000+00:00', '9999-12-...            55  ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
  • Case 2: Using DataFrame on Non-PTI Table and showcasing usage of unbounded time in grouping.
    • Create DataFrame on Non-PTI table
      >>> package_tracking = DataFrame("package_tracking")
      
      >>> package_tracking.columns
      ['parcelnumber', 'clock_time', 'status']
      
      >>> package_tracking
                                    clock_time                          status
      parcelnumber
      79            2016-10-15 08:05:00.000000         picked up from customer
      75            2016-10-15 09:10:00.000000    arrived at receiving station
      75            2016-10-15 10:00:00.000000       in transit to destination
      75            2016-10-15 16:10:00.000000  arrived at destination station
      75            2016-10-15 17:00:00.000000           delivered to customer
      80            2016-10-15 09:20:00.000000         picked up from customer
      59            2016-10-15 08:05:00.000000         picked up from customer
      75            2016-10-15 16:30:00.000000          in transit to customer
      75            2016-10-15 08:00:00.000000         picked up from customer
      60            2016-10-15 10:45:00.000000    arrived at receiving station
      
    • Execute groupby_time() using unbounded time for timebucket_duration
      >>> gbt = package_tracking.groupby_time("*", "parcelnumber", "clock_time")
      
    • Execute delta_t, with start and end conditions specified as String
      >>> start_condition = "status LIKE 'picked%up%customer'"
      
      >>> end_condition = "status LIKE 'delivered%customer'"
      
      >>> gbt.delta_t(start_condition, end_condition)
                                            TIMECODE_RANGE  parcelnumber                                delta_t_td_timecode
      0  ('1970-01-01 00:00:00.000000+00:00', '9999-12-...            75  ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
      1  ('1970-01-01 00:00:00.000000+00:00', '9999-12-...            55  ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
      

Example: Search the Minimum and Maximum Observed Temperatures

This example measures the time between minimum and maximum observed temperatures every 30 minutes between 8:00 AM and 10:30 AM on each buoy.

  • Case 1: DataFrame on Non-sequenced PTI Table. Specify start condition and end condition as string.
    • Create DataFrame
      >>> ocean_buoys = DataFrame("ocean_buoys")
      
    • Check DataFrame columns and the head of the data
      >>> ocean_buoys.columns
      ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
      
      >>> ocean_buoys.head()
                             TD_TIMECODE  temperature  salinity
      buoyid
      0       2014-01-06 08:10:00.000000        100.0        55
      0       2014-01-06 08:08:59.999999          NaN        55
      1       2014-01-06 09:01:25.122200         77.0        55
      1       2014-01-06 09:03:25.122200         79.0        55
      1       2014-01-06 09:01:25.122200         70.0        55
      1       2014-01-06 09:02:25.122200         71.0        55
      1       2014-01-06 09:03:25.122200         72.0        55
      0       2014-01-06 08:09:59.999999         99.0        55
      0       2014-01-06 08:00:00.000000         10.0        55
      0       2014-01-06 08:10:00.000000         10.0        55
    • Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
      >>> ocean_buoys_dt = ocean_buoys[(ocean_buoys.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys.TD_TIMECODE < '2014-01-06 10:30:00')]
    • Get the minimum and maximum temperature within time range of 30 minutes
      >>> df_min_max_temp1 = ocean_buoys_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
    • Join the dataframe with original 'ocean_buoys'
      >>> df2_join1 = ocean_buoys.join(df_min_max_temp1, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
      
      >>> gbt3 = df2_join1.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
      
    • Set the start and end conditions
      >>> start_condition = "temperature = min_temperature"
      
      >>> end_condition = "temperature = max_temperature"
      
      >>> gbt3.delta_t(start_condition, end_condition)
                                            TIMECODE_RANGE  GROUP BY TIME(DAYS(1))  t1_buoyid                                delta_t_td_timecode
      0  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077         44  ('2014-01-06 10:00:26.122200-00:00', '2014-01-...
      1  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077          1  ('2014-01-06 09:01:25.122200-00:00', '2014-01-...
      2  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077          0  ('2014-01-06 08:00:00.000000-00:00', '2014-01-...
  • Case 2: DataFrame on Sequenced PTI Table. Specify start condition and end condition as ColumnExpression.
    • Create DataFrame
      >>> ocean_buoys_seq = DataFrame("ocean_buoys_seq")
      
    • Check DataFrame columns and the head of the data
      >>> ocean_buoys_seq.columns
      ['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
      >>> ocean_buoys_seq.head()
                             TD_TIMECODE  TD_SEQNO  salinity  temperature       dates
      buoyid
      0       2014-01-06 08:00:00.000000        26        55         10.0  2016-02-26
      0       2014-01-06 08:08:59.999999        18        55          NaN  2015-06-18
      1       2014-01-06 09:02:25.122200        24        55         78.0  2015-12-24
      1       2014-01-06 09:01:25.122200        23        55         77.0  2015-11-23
      1       2014-01-06 09:02:25.122200        12        55         71.0  2014-12-12
      1       2014-01-06 09:03:25.122200        13        55         72.0  2015-01-13
      1       2014-01-06 09:01:25.122200        11        55         70.0  2014-11-11
      0       2014-01-06 08:10:00.000000        19        55         10.0  2015-07-19
      0       2014-01-06 08:09:59.999999        17        55         99.0  2015-05-17
      0       2014-01-06 08:10:00.000000        27        55        100.0  2016-03-27
    • Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
      >>> ocean_buoys_seq_dt = ocean_buoys_seq[(ocean_buoys_seq.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys_seq.TD_TIMECODE < '2014-01-06 10:30:00')]
      
    • Get the minimum and maximum temperature within time range of 30 minutes
      >>> df_min_max_temp2 = ocean_buoys_seq_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
      
    • Join the dataframe with original 'ocean_buoys'
      >>> df2_join2 = ocean_buoys_seq.join(df_min_max_temp2, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
      
      >>> gbt4 = df2_join2.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
      
    • Set the start and end conditions
      >>> start_condition = gbt4.temperature == gbt4.min_temperature
      
      >>> end_condition = gbt4.temperature == gbt4.max_temperature
      
      >>> gbt4.delta_t(start_condition, end_condition)
                                            TIMECODE_RANGE  GROUP BY TIME(DAYS(1))  t1_buoyid                                delta_t_td_timecode
      0  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077         44  ('2014-01-06 10:00:26.122200-00:00', '2014-01-...
      1  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077          1  ('2014-01-06 09:01:25.122200-00:00', '2014-01-...
      2  ('2014-01-06 00:00:00.000000+00:00', '2014-01-...                   16077          0  ('2014-01-06 08:00:00.000000-00:00', '2014-01-...