teradataml DataFrameColumn supports the following set of regular aggregate functions which can be used with and without DataFrame.groupby().
- You must use DataFrame.assign() when using the aggregate functions on ColumnExpression, also known as, teradataml DataFrameColumn.
- You should always use "drop_columns=True" in DataFrame.assign() while running the aggregate operation on teradataml DataFrame.
- drop_columns argument in DataFrame.assign() is ignored, when aggregate function is operated on DataFrame.groupby().
See the DataFrameColumn Aggregate Functions section of Teradata Package for Python Function Reference, B700-4008) at https://docs.teradata.com/ for detailed description and usage examples of these functions.
Sr. No. | Function Name | Description |
---|---|---|
1 | corr() | Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs. |
2 | count() | Returns column-wise count of the ColumnExpression, also known as, teradataml DataFrameColumn. |
3 | covar_pop() | Returns the column-wise population covariance of its arguments for all non-null data point pairs. Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair. |
4 | covar_samp() | Returns the column-wise sample covariance of its arguments for all non-null data point pairs. Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair. |
5 | kurtosis() | Returns column-wise kurtosis value of the ColumnExpression, also known as, teradataml DataFrameColumn. Kurtosis is the fourth moment of the distribution of the standardized (z) values. It is a measure of the outlier (rare, extreme observation) character of the distribution as compared with the normal (or Gaussian) distribution.
|
6 | max() | Returns column-wise maximum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
7 | mean() | Returns column-wise mean value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
8 | median() | Returns column-wise median value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
9 | min() | Returns column-wise minimum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
10 | percentile() | Return the value which represents the desired percentile for the ColumnExpression, also known as, teradataml DataFrameColumn. |
11 | regr_avgx() | Returns the column-wise mean of the independent variable for all non-null data pairs of the dependent and an independent variable arguments. |
12 | regr_avgy() | Returns the column-wise mean of the dependent variable for all non-null data pairs of the dependent and independent variable arguments. |
13 | regr_count() | Returns the column-wise count of all non-null data pairs of the dependent and independent variable arguments. |
14 | regr_intercept() | Returns the column-wise intercept of the univariate linear regression line through all non-null data pairs of the dependent and independent variable arguments. The intercept is the point at which the regression line through the non-null data pairs in the sample intersects the ordinate, or y-axis, of the graph. |
15 | regr_r2() | Returns the column-wise coefficient of determination for all non-null data pairs of the dependent and independent variable arguments. |
16 | regr_slope() | Returns the column-wise coefficient slope of the univariate linear regression line through all non-null data pairs of the dependent and an independent variable arguments. |
17 | regr_sxx() | Returns the column-wise sum of the squares of the independent variable expression for all non-null data pairs of dependent and an independent variable arguments. |
18 | regr_sxy() | Returns the column-wise sum of the products of the independent variable and the dependent variable for all non‑null data pairs of the dependent and independent variable arguments. |
19 | regr_syy() | Returns the column-wise sum of the squares of the dependent variable expression for all non-null data pairs of dependent and an independent variable arguments. |
20 | skew() | Returns column-wise skewness of the distribution of the ColumnExpression, also known as, teradataml DataFrameColumn. Skewness is the third moment of a distribution. It is a measure of the asymmetry of the distribution about its mean compared with the normal (or Gaussian) distribution.
|
21 | std() | Returns column-wise sample or population standard deviation value of the ColumnExpression, also known as, teradataml DataFrameColumn. The standard deviation is the second moment of a distribution.
The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value. |
22 | sum() | Returns column-wise sum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
23 | var() | Returns column-wise sample or population variance of the columns of the ColumnExpression, also known as, teradataml DataFrameColumn.
|
teradataml special aggregate functions | ||
24 | csum() | Returns cumulative sum value for rows in the partition of the column. |
25 | msum() | Computes the moving sum for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |
26 | mavg() | Computes the moving average for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |
27 | mdiff() | Computes the moving difference for the current row and the preceding "width" rows in a partition, by sorting the rows according to "sort_columns". |
28 | mlinreg() | Computes the moving linear regression for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |