Teradata Package for Python Function Reference | 20.00 - delta_t - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrameGroupByTime.delta_t = delta_t(self, start_condition, end_condition)
- DESCRIPTION:
Calculates the time difference, or DELTA_T, between a starting and an ending event.
The calculation is performed against a time-ordered time series data set.
Note:
1. This is the only Time Series Aggregate function that works with timebucket_duration as "*"
in groupby_time(), i.e., unbounded time.
2. When using groupby_time() with unbounded time, the following rules apply to
the system virtual columns:
a. $TD_GROUP_BY_TIME: Always has a value of 1, since there is only one timebucket.
b. $TD_TIMECODE_RANGE: Composed of the first and last timecode values read for the group.
Note that the data being evaluated in the filtering conditions (for example, the minimum and
maximum temperature observation) must belong to the timecode value present in the same row
of data. This is the expected behavior. However, this assumption can be violated when
joining multiple tables together. It is possible to construct a query where the result of a
join causes specific data points (for example, a temperature reading) to be present in a
data row with a timecode that is not indicative of when that data point occurred.
In such a scenario, it is highly likely that the results are not as expected, or are misleading.
Vantage does not detect these types of queries, so one must make sure to preserve the
correlation between data points and timecodes.
PARAMETERS:
start_condition:
Required Argument.
Specifies any supported filtering condition that defines the start of the time period for which
you are searching.
Types: str or ColumnExpression
end_condition:
Required Argument.
Specifies any supported filtering condition that defines the end of the time period for which
you are searching.
Types: str or ColumnExpression
RETURNS:
teradataml DataFrame
Note:
1. Function returns a column of PERIOD(TIMESTAMP WITH TIME ZONE) type (Vantage Data type)
composed of the start and end timecode, i.e., timecode column used for aggregation
of each start-end pair.
2. One result is returned per complete start-end pair found within the
GROUP BY TIME window. The start-end pair process is as follows:
a. If the current source data meets the start condition, the current
timecode is saved as the start time.
b. If the current source data meets the end condition, and a saved start
timecode already exists, the start timecode is saved with the end timecode
encountered as a result pair.
3. The processing algorithm implies that multiple results may be found in each group.
4. If no start-end pair is encountered, no result row is returned.
5. Any result of delta_t which has a delta less than 1 microsecond (including a delta of 0,
in the case of a result which comes from a single point in time) is automatically
rounded to 1 microsecond.
This is strictly enforced to match Period data type semantics in Vantage which dictate that a
starting and ending bound of a Period type may not be equivalent. The smallest granularity
supported in Vantage is the microsecond, so these results are rounded accordingly.
RAISES:
TypeError - If incorrect type of values passed to input argument.
ValueError - If invalid value passed to the the argument.
TeradataMLException - In case illegal conditions are passed
EXAMPLES :
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti", "package_tracking_pti", "package_tracking"])
>>>
#
# Example 1: Finding Time Elapsed between Shipping and Receiving an Item.
# Input data used for this example contains information about parcels
# sent by a delivery service.
#
#
# Case 1: Using DataFrame on PTI Table and showcasing usage of unbounded time in grouping.
#
>>> # Create DataFrame on PTI table
... package_tracking_pti = DataFrame("package_tracking_pti")
>>> package_tracking_pti.columns
['TD_TIMECODE', 'parcelnumber', 'status']
>>> package_tracking_pti
TD_TIMECODE status
parcelnumber
55 2016-10-15 10:00:00.000000 in transit to destination
75 2016-10-15 16:30:00.000000 in transit to customer
75 2016-10-15 08:00:00.000000 picked up from customer
55 2016-10-15 09:10:00.000000 arrived at receiving station
60 2016-10-15 10:45:00.000000 arrived at receiving station
75 2016-10-15 17:00:00.000000 delivered to customer
59 2016-10-15 08:05:00.000000 picked up from customer
79 2016-10-15 08:05:00.000000 picked up from customer
60 2016-10-15 09:20:00.000000 picked up from customer
75 2016-10-15 16:10:00.000000 arrived at destination station
>>>
>>> # Execute groupby_time() using unbounded time for timebucket_duration.
... gbt = package_tracking_pti.groupby_time("*", "parcelnumber")
>>> # Execute delta_t, with start and end conditions specified as String.
... start_condition = "status LIKE 'picked%up%customer'"
>>> end_condition = "status LIKE 'delivered%customer'"
>>> gbt.delta_t(start_condition, end_condition)
TIMECODE_RANGE parcelnumber delta_t_td_timecode
0 ('2012-01-01 00:00:00.000000+00:00', '9999-12-... 75 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
1 ('2012-01-01 00:00:00.000000+00:00', '9999-12-... 55 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
>>>
#
# Case 2: Using DataFrame on Non-PTI Table and showcasing usage of unbounded time in grouping.
#
>>> # Create DataFrame on Non-PTI table
... package_tracking = DataFrame("package_tracking")
>>> package_tracking.columns
['parcelnumber', 'clock_time', 'status']
>>> package_tracking
clock_time status
parcelnumber
79 2016-10-15 08:05:00.000000 picked up from customer
75 2016-10-15 09:10:00.000000 arrived at receiving station
75 2016-10-15 10:00:00.000000 in transit to destination
75 2016-10-15 16:10:00.000000 arrived at destination station
75 2016-10-15 17:00:00.000000 delivered to customer
80 2016-10-15 09:20:00.000000 picked up from customer
59 2016-10-15 08:05:00.000000 picked up from customer
75 2016-10-15 16:30:00.000000 in transit to customer
75 2016-10-15 08:00:00.000000 picked up from customer
60 2016-10-15 10:45:00.000000 arrived at receiving station
>>>
>>> # Execute groupby_time() using unbounded time for timebucket_duration.
... gbt = package_tracking.groupby_time("*", "parcelnumber", "clock_time")
>>> # Execute delta_t, with start and end conditions specified as String.
... start_condition = "status LIKE 'picked%up%customer'"
>>> end_condition = "status LIKE 'delivered%customer'"
>>> gbt.delta_t(start_condition, end_condition)
TIMECODE_RANGE parcelnumber delta_t_td_timecode
0 ('1970-01-01 00:00:00.000000+00:00', '9999-12-... 75 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
1 ('1970-01-01 00:00:00.000000+00:00', '9999-12-... 55 ('2016-10-15 08:00:00.000000-00:00', '2016-10-...
>>>
#
# Example 2: Searching the Minimum and Maximum Observed Temperatures
# This example measures the time between minimum and maximum observed temperatures every
# 30 minutes between 8:00 AM and 10:30 AM on each buoy.
#
#
# Case 1: DataFrame on Non-sequenced PTI Table - specifying start condition and end condition as string
#
>>> # Create DataFrame
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
>>> # Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
... ocean_buoys_dt = ocean_buoys[(ocean_buoys.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys.TD_TIMECODE < '2014-01-06 10:30:00')]
>>>
>>> # Let's get the minimum and maximum temperature within time range of 30 minutes
... df_min_max_temp1 = ocean_buoys_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
>>> # Join the dataframe with original 'ocean_buoys'
... df2_join1 = ocean_buoys.join(df_min_max_temp1, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
>>> gbt3 = df2_join1.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
>>>
>>> # Let's set the start and end conditions
... start_condition = "temperature = min_temperature"
>>> end_condition = "temperature = max_temperature"
>>> gbt3.delta_t(start_condition, end_condition)
TIMECODE_RANGE GROUP BY TIME(DAYS(1)) t1_buoyid delta_t_td_timecode
0 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 44 ('2014-01-06 10:00:26.122200-00:00', '2014-01-...
1 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 1 ('2014-01-06 09:01:25.122200-00:00', '2014-01-...
2 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 0 ('2014-01-06 08:00:00.000000-00:00', '2014-01-...
>>>
#
# Case 2: Same example as that of above, just DataFrame on Sequenced PTI Table and
# specifying start condition and end condition as ColumnExpression.
#
>>> # Create DataFrame
... ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head()
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26
0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18
1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24
1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23
1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12
1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13
1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11
0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19
0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17
0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
>>> # Filter the data and grab all rows between timestamp '2014-01-06 08:00:00' and '2014-01-06 10:30:00'
... ocean_buoys_seq_dt = ocean_buoys_seq[(ocean_buoys_seq.TD_TIMECODE >= '2014-01-06 08:00:00') & (ocean_buoys_seq.TD_TIMECODE < '2014-01-06 10:30:00')]
>>>
>>> # Let's get the minimum and maximum temperature within time range of 30 minutes
... df_min_max_temp2 = ocean_buoys_seq_dt.groupby_time("MINUTES(30)", "buoyid", "TD_TIMECODE").agg({"temperature": ["min", "max"]})
>>> # Join the dataframe with original 'ocean_buoys'
... df2_join2 = ocean_buoys_seq.join(df_min_max_temp2, on="buoyid", how="inner", lsuffix="t1", rsuffix="t2")
>>> gbt4 = df2_join2.groupby_time("DAYS(1)", "t1_buoyid", timecode_column="TD_TIMECODE")
>>>
>>> # Let's set the start and end conditions
>>> start_condition = gbt4.temperature == gbt4.min_temperature
>>> end_condition = gbt4.temperature == gbt4.max_temperature
>>> gbt4.delta_t(start_condition, end_condition)
TIMECODE_RANGE GROUP BY TIME(DAYS(1)) t1_buoyid delta_t_td_timecode
0 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 44 ('2014-01-06 10:00:26.122200-00:00', '2014-01-...
1 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 1 ('2014-01-06 09:01:25.122200-00:00', '2014-01-...
2 ('2014-01-06 00:00:00.000000+00:00', '2014-01-... 16077 0 ('2014-01-06 08:00:00.000000-00:00', '2014-01-...
>>>