Teradata Package for Python Function Reference on VantageCloud Lake - sum - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.sum = sum(self, distinct=False)
Returns column-wise sum value of the dataframe.
1. teradataml doesn't support sum operation on columns of str, datetime types.
2. Null values are not included in the result computation.
Optional Argument.
Specifies whether to exclude duplicate values while calculating the sum.
Default Value: False
Types: bool
teradataml DataFrame object with sum()
operation performed.
1. EXECUTION_FAILED - If sum() operation fails to
generate the column-wise summation value of the dataframe.
Possible error message:
Failed to perform 'sum'. (Followed by error message).
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the sum() operation
doesn't support all the columns in the dataframe.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col2 - PERIOD_TIME), (col3 -
BLOB)] is/are unsupported for 'sum' operation.
# Load the data to run the example.
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("dataframe", ["employee_info"])
# Create teradataml dataframe.
>>> df1 = DataFrame("employee_info")
>>> print(df1)
first_name marks dob joined_date
101 abcde None None 02/12/05
100 abcd None None None
112 None None None 18/12/05
# Prints sum of the values of each column(with supported data types).
>>> df1.sum()
sum_employee_no sum_marks
0 313 None
# Using sum() as Time Series Aggregate.
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys"])
# Time Series Aggregate Example 1: Executing sum() function on DataFrame created on
# non-sequenced PTI table. We will consider all rows for the
# columns while calculating the sum value.
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
# To use sum() as Time Series Aggregate we must run groupby_time() first, followed by sum().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.sum().sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid sum_salinity sum_temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 275 219
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 330 447
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 165 243
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 715 625
# Time Series Aggregate Example 2: Executing sum() function on DataFrame created on
# non-sequenced PTI table. We will consider DISTINCT values for the
# columns while calculating the sum value.
# To use sum() as Time Series Aggregate we must run groupby_time() first, followed by sum().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.sum(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid sum_salinity sum_temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 55 209
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 55 447
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 55 243
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 55 261