Teradata Package for Python Function Reference | 20.00 - agg - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.agg = agg(self, func=None)
- DESCRIPTION:
Perform aggregates using one or more operations.
PARAMETERS:
func:
Required Argument.
Specifies the function(s) to apply on DataFrame columns.
Valid values for func are:
'count', 'sum', 'min', 'max', 'mean', 'std', 'percentile', 'unique',
'median', 'var'
Acceptable formats for function(s) are
string, dictionary, list of strings/functions/ColumnExpression or ColumnExpression.
Accepted combinations are:
1. String function name
2. List of string functions
3. Dictionary containing column name as key and
aggregate function name (string or list of
strings) as value
4. ColumnExpression built using the aggregate functions.
5. List of ColumnExpression built using the aggregate functions.
Note:
* The name of the output columns are generated based on aggregate functions and column names.
For Example,
1. "func" passed as a string.
>>> df.agg('mean')
Assume that the column names of the dataframe are employee_no, first_name, marks, dob, joined_date.
After the above operation, the output column names are:
mean_employee_no, mean_marks, mean_dob, mean_joined_date
2. "func" passed as a list of string functions.
>>> df.agg(['min', 'sum'])
Assume that the column names of the dataframe are employee_no, first_name, marks, dob, joined_date.
After the above operation, the output column names are:
min_employee_no, sum_employee_no, min_first_name, min_marks, sum_marks, min_dob, min_joined_date
3. "func" passed as a dictionary containing column name as key and aggregate function name as value.
>>> df.agg({'employee_no' : ['min', 'sum', 'var'], 'first_name' : ['min']})
Output column names after the above operation are:
min_employee_no, sum_employee_no, var_employee_no, min_first_name
4. "func" passed as a ColumnExpression built using the aggregate functions.
>>> df.agg(df.first_name.count())
Output column name after the above operation is:
count(first_name)
5. "func" passed as a list of ColumnExpression built using the aggregate functions.
>>> df.agg([df.employee_no.min(), df.first_name.count()])
Output column names after the above operation are:
min(employee_no), count(first_name)
* On ColumnExpression or list of ColumnExpression alias() can be used to
return the output columns with aliased name.
For Example,
>>> df.agg(df.first_name.count().alias("total_names"))
Output column name after the above operation is:
total_names
>>> df.agg([df.joined_date.min().alias("min_date"), df.first_name.count().alias("total_names")])
Output column names after the above operation are:
min_date, total_names
RETURNS:
teradataml DataFrame object with operations
mentioned in parameter 'func' performed on specified
columns.
RAISES:
TeradataMLException
1. TDMLDF_AGGREGATE_FAILED - If operations on given columns
fail to generate aggregate dataframe.
Possible error message:
Unable to perform 'agg()' on the dataframe.
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the provided
aggregate operations do not support specified columns.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col1 - VARCHAR)] is/are
unsupported for 'sum' operation.
3. TDMLDF_INVALID_AGGREGATE_OPERATION - If the aggregate
operation(s) received in parameter 'func' is/are
invalid.
Possible error message:
Invalid aggregate operation(s): minimum, counter.
Valid aggregate operation(s): count, max, mean, min,
std, sum.
4. TDMLDF_AGGREGATE_INVALID_COLUMN - If any of the columns
specified in 'func' is not present in the dataframe.
Possible error message:
Invalid column(s) given in parameter func: col1.
Valid column(s) : A, B, C, D.
5. MISSING_ARGS - If the argument 'func' is missing.
Possible error message:
Following required arguments are missing: func.
6. UNSUPPORTED_DATATYPE - If the argument 'func' is not of
valid datatype.
Possible error message:
Invalid type(s) passed to argument 'func', should be:
['str, dict, ColumnExpression or list of values of type(s): str, ColumnExpression'].
EXAMPLES :
# Load the data to run the example.
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("dataframe", ["employee_info", "sales"])
# Create teradataml dataframe.
>>> df = DataFrame("employee_info")
>>> print(df)
first_name marks dob joined_date
employee_no
101 abcde None None 02/12/05
100 abcd None None None
112 None None None 18/12/05
>>>
# Get the minimum, sum and variance of employee number and minimum and mean of name,
# by passing dictionary of column names to string function/list of string functions as parameter.
>>> df.agg({'employee_no' : ['min', 'sum', 'var'], 'first_name' : ['min', 'mean']})
min_employee_no sum_employee_no var_employee_no min_first_name
0 100 313 44.333333 abcd
# Get the minimum and sum of all the columns in the dataframe,
# by passing list of string functions as parameter.
>>> df.agg(['min', 'sum'])
min_employee_no sum_employee_no min_first_name min_marks sum_marks min_dob min_joined_date
0 100 313 abcd None None None 1902-05-12
# Get the mean of all the columns in the dataframe, by passing string function as parameter.
>>> df.agg('mean')
mean_employee_no mean_marks mean_dob mean_joined_date
0 104.333333 None None 60/12/04
# Get the total names in the dataframe, by running count() on the "first_name"
# and passing ColumnExpression as parameter.
>>> df.agg(df.first_name.count())
count(first_name)
0 2
# Get the minimum of joining date and total of names in the dataframe,
# by running min() on joined_date and count() on the "first_name"
# and passing list of ColumnExpression as parameter.
>>> df.agg([df.employee_no.min(), df.first_name.count()])
min(employee_no) count(first_name)
0 100 2
# Get the total names in the dataframe, by running count() on the "first_name" and
# use alias() to have the output column named as "total_names".
>>> df.agg(df.first_name.count().alias("total_names"))
total_names
0 2
# Get the minimum of joining date and total names in the dataframe,
# by running min() on joined_date and count() on the "first_name" and
# use alias() to have the output column named as "min_date" and "total_names".
>>> df.agg([df.joined_date.min().alias("min_date"), df.first_name.count().alias("total_names")])
min_date total_names
0 02/12/05 2
# Select only subset of columns from the DataFrame.
>>> df1 = df.select(['employee_no', 'first_name', 'joined_date'])
# List of string functions as parameter.
>>> df1.agg(['mean', 'unique'])
mean_employee_no unique_employee_no unique_first_name mean_joined_date unique_joined_date
0 104.333333 3 2 60/12/04 2
>>> df.agg('percentile')
percentile_employee_no percentile_marks
0 101 None
# Using another table 'sales' (having repeated values) to demonstrate operations
# 'unique' and 'percentile'.
# Create teradataml dataframe.
>>> df = DataFrame('sales')
>>> df
Feb Jan Mar Apr datetime
accounts
Yellow Inc 90.0 None None None 2017-04-01
Alpha Co 210.0 200 215 250 2017-04-01
Jones LLC 200.0 150 140 180 2017-04-01
Orange Inc 210.0 None None 250 2017-04-01
Blue Inc 90.0 50 95 101 2017-04-01
Red Inc 200.0 150 140 None 2017-04-01
>>> df.agg('percentile')
percentile_Feb percentile_Jan percentile_Mar percentile_Apr
0 200.0 150 140 215
>>> df.agg('unique')
unique_accounts unique_Feb unique_Jan unique_Mar unique_Apr unique_datetime
0 6 3 3 3 3 1