Use the cube() function to create a multidimensional cube for a teradataml DataFrame using specified columns, and there by running aggregates on it produce the aggregations on different dimensions.
This method does not support operations on array columns.
Required Parameter
- columns
- Specifies the names of input teradataml DataFrame columns.
Optional Parameter
- include_grouping_columns
- Specifies whether to include aggregations on the grouping columns or not. When set to True, the resultant DataFrame will have the aggregations on the columns mentioned in "columns_expr". Otherwise, resultant DataFrame will not have aggregations on the columns mentioned in "columns_expr".
Default value: False
Example setup
In this example, "admission_train" dataset is used.
>>> from teradataml import *
>>> load_example_data("dataframe", "admissions_train")
>>> df = DataFrame("admissions_train")
Example 1: Analyzes the data by grouping into masters and stats dimensions
>>> df1 = df.cube(["masters", "stats"]).sum()
>>> df1 masters stats sum_id sum_gpa sum_admitted 0 no Beginner 8 3.60 1 1 None Advanced 555 84.21 16 2 None Beginner 21 18.31 3 3 yes Beginner 13 14.71 2 4 None None 820 141.67 26 5 yes Advanced 366 49.26 7 6 no None 343 63.96 16 7 None Novice 244 39.15 7 8 no Advanced 189 34.95 9 9 yes Novice 98 13.74 1
Example 2: Find the average of all valid columns by grouping the DataFrame with columns 'masters' and 'admitted'
Include grouping columns in aggregate function 'avg'.
>>> df1 = df.cube(["masters", "admitted"], include_grouping_columns=True).avg()
>>> df1
masters admitted avg_id avg_gpa avg_admitted
0 yes NaN 21.681818 3.532273 0.454545
1 None 1.0 18.846154 3.533462 1.000000
2 no NaN 19.055556 3.553333 0.888889
3 yes 0.0 24.083333 3.613333 0.000000
4 None NaN 20.500000 3.541750 0.650000
5 None 0.0 23.571429 3.557143 0.000000
6 yes 1.0 18.800000 3.435000 1.000000
7 no 1.0 18.875000 3.595000 1.000000
8 no 0.0 20.500000 3.220000 0.000000
>>>
Example 3: Find the average of all valid columns by grouping the DataFrame with columns 'masters' and 'admitted'
Do not include grouping columns in aggregate function 'avg'.
>>> df1 = df.cube(["masters", "admitted"], include_grouping_columns=False).avg() >>> df1 masters admitted avg_id avg_gpa 0 no 0.0 20.500000 3.220000 1 None 1.0 18.846154 3.533462 2 no NaN 19.055556 3.553333 3 yes 0.0 24.083333 3.613333 4 None NaN 20.500000 3.541750 5 None 0.0 23.571429 3.557143 6 yes 1.0 18.800000 3.435000 7 yes NaN 21.681818 3.532273 8 no 1.0 18.875000 3.595000 >>>