Teradata Package for Python Function Reference | 17.10 - agg - Teradata Package for Python
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.agg = agg(self, func=None)
- DESCRIPTION:
Perform aggregates using one or more operations.
PARAMETERS:
func:
Required Argument.
Specifies the function(s) to apply on DataFrame columns.
Valid values for func are:
'count', 'sum', 'min', 'max', 'mean', 'std', 'percentile', 'unique',
'median', 'var'
Acceptable formats for function(s) are
string, dictionary or list of strings/functions.
Accepted combinations are:
1. String function name
2. List of string functions
3. Dictionary containing column name as key and
aggregate function name (string or list of
strings) as value
RETURNS:
teradataml DataFrame object with operations
mentioned in parameter 'func' performed on specified
columns.
RAISES:
TeradataMLException
1. TDMLDF_AGGREGATE_FAILED - If operations on given columns
fail to generate aggregate dataframe.
Possible error message:
Unable to perform 'agg()' on the dataframe.
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the provided
aggregate operations do not support specified columns.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col1 - VARCHAR)] is/are
unsupported for 'sum' operation.
3. TDMLDF_INVALID_AGGREGATE_OPERATION - If the aggregate
operation(s) received in parameter 'func' is/are
invalid.
Possible error message:
Invalid aggregate operation(s): minimum, counter.
Valid aggregate operation(s): count, max, mean, min,
std, sum.
4. TDMLDF_AGGREGATE_INVALID_COLUMN - If any of the columns
specified in 'func' is not present in the dataframe.
Possible error message:
Invalid column(s) given in parameter func: col1.
Valid column(s) : A, B, C, D.
5. MISSING_ARGS - If the argument 'func' is missing.
Possible error message:
Following required arguments are missing: func.
6. UNSUPPORTED_DATATYPE - If the argument 'func' is not of
valid datatype.
Possible error message:
Invalid type(s) passed to argument 'func', should be:" "['str', 'list', 'dict'].
EXAMPLES :
# Load the data to run the example.
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("dataframe", ["employee_info", "sales"])
# Create teradataml dataframe.
>>> df = DataFrame("employee_info")
>>> print(df)
first_name marks dob joined_date
employee_no
101 abcde None None 02/12/05
100 abcd None None None
112 None None None 18/12/05
>>>
# Dictionary of column names to string function/list of string functions as parameter.
>>> df.agg({'employee_no' : ['min', 'sum', 'var'], 'first_name' : ['min', 'mean']})
min_employee_no sum_employee_no var_employee_no min_first_name
0 100 313 44.333333 abcd
# List of string functions as parameter.
>>> df.agg(['min', 'sum'])
min_employee_no sum_employee_no min_first_name min_marks sum_marks min_dob min_joined_date
0 100 313 abcd None None None 1902-05-12
# A string function as parameter.
>>> df.agg('mean')
mean_employee_no mean_marks mean_dob mean_joined_date
0 104.333333 None None 60/12/04
# Select only subset of columns from the DataFrame.
>>> df1 = df.select(['employee_no', 'first_name', 'joined_date'])
# List of string functions as parameter.
>>> df1.agg(['mean', 'unique'])
mean_employee_no unique_employee_no unique_first_name mean_joined_date unique_joined_date
0 104.333333 3 2 60/12/04 2
>>> df.agg('percentile')
percentile_employee_no percentile_marks
0 101 None
# Using another table 'sales' (having repeated values) to demonstrate operations
# 'unique' and 'percentile'.
# Create teradataml dataframe.
>>> df = DataFrame('sales')
>>> df
Feb Jan Mar Apr datetime
accounts
Yellow Inc 90.0 None None None 2017-04-01
Alpha Co 210.0 200 215 250 2017-04-01
Jones LLC 200.0 150 140 180 2017-04-01
Orange Inc 210.0 None None 250 2017-04-01
Blue Inc 90.0 50 95 101 2017-04-01
Red Inc 200.0 150 140 None 2017-04-01
>>> df.agg('percentile')
percentile_Feb percentile_Jan percentile_Mar percentile_Apr
0 200.0 150 140 215
>>> df.agg('unique')
unique_accounts unique_Feb unique_Jan unique_Mar unique_Apr unique_datetime
0 6 3 3 3 3 1