Use the pivot() function to rotate data from rows into columns to create easy-to-read DataFrames. The function is useful for reporting purposes, as it allows to aggregate and rotate the data.
- columns: Specifies the columns or dictionary of columns with distinct column values used for pivoting the data.
This argument is required when keyword arguments are not specified. Otherwise, it is optional.
- When specified as a column, this function automatically extracts the distinct values for the column.For example: columns = df.qtr #, or columns = [df.qtr, dr.yr]Distinct values 'Q1', 'Q2', 'Q3', and so on, are automatically extracted from column 'qtr' for pivoting.
- When specified as dictionary:
- Key is a column expression representing a column.
- Value is a literal or list of literals, that is, values in a column or a teradataml DataFrame with only one column.
When value is a teradataml DataFrame, the order of output pivoted columns is not guaranteed. In this case, Teradata recommends not specifying returns argument, so column names are generated properly according to the order of records.When dictionary value contains literals,- If limit_combinations is set to False, pivot is done based on each combination of the values specified for each key.
- Otherwise, combinations are restricted based on the index values of the list.
All values in dictionary must have the same length when limit_combinations is set to True.Example 1: columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]}
In this case, pivot is based on all the combinations of values as specified in the following, for columns 'qtr' and 'yr':
- If limit_combinations is set to False: (Q1, 2001), (Q1, 2002), (Q2, 2001), and (Q2, 2002).
- If limit_combinations is set to True: (Q1, 2001) and (Q2, 2002).
Example 2: columns={df.qtr: quarter_df, df.year: [2001, 2002]}, where value passed to df.qtr is a teradataml DataFrame with one column.
In this cases, function extracts the distinct values from the DataFrame and pivot is based on all the combinations of values or limited combination of values based on argument limit_combinations.
String type values in dictionary are not case sensitive. See the following example 2 for more details.
- When specified as a column, this function automatically extracts the distinct values for the column.
- aggfuncs: Specifies the column aggregate functions to be used for pivoting the data.
For example:
To pivot total of 'sales' column and average of 'cogs' + 'sales' columns in the DataFrame "df", specify this argument as:
aggfuncs=[df.sales.sum(), (df.cogs+df.sales).avg()]
- limit_combinations: Specifies whether to limit the number of combinations when columns argument is passed as a dictionary.
Default value is False.
When set to True, function limits the combinations one-to-one based on index of the values in list, hence all dictionary values must be a list of same length or a single literal.
For example: df.pivot(columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]}, limit_combinations=True, ...)
Pivot will be based on columns 'qtr' and 'yr' for values (Q1, 2001) and (Q2, 2002) only.
This argument is ignored when columns is a ColumnExpression or a list of ColumnExpressions. - margins: Specifies the aggregate operation to be performed on output columns.Aggregate operation to be performed must be passed as dictionary where:
- Key is a string specifying aggregate operation to be performed.
- Value is a tuple or list of tuples specifying the output column names as string. Columns specified in the tuple are used in aggregate operation performed.
For example: If for the year 2001, following three aggregate columns are needed in the output:- Sum of sales in Q1 and Q2
- Sum of sales in Q2 and Q3
- Average of cogs for first three quarters
margins can be specified as:
margins={"SUM": [("Q12001_sales", "Q22001_sales"), ("Q22001_sales", "Q32001_sales")],
"AVG": ("Q12001_cogs", "Q22001_cogs", "Q32001_cogs")}
- Supported aggregate functions are SUM, AVG, MIN and MAX.
- If returns is specified, column names for margins are considered from returns clause.
- returns: Specifies the custom column names to be returned in the output DataFrame.
If this argument is not specified, function internally generates the output column names.
The number of column names must be same as columns generated by pivot. For example:- If columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]}, aggfuncs=[df.sales.sum(), df.cogs.sum()] and limit_combination=False, then number of new columns in output will be: len(aggfuncs) * len(columns[df.year]) * len(columns[df.qtr]) or 2 * 2 * 2 = 8.
Hence, returns must be list of string with 8 elements.
- If in the previous example, limit_combination=True, then number of new columns is output will be: len(aggfuncs) * (len(columns[df.year]) OR len(columns[df.qtr])) or or 2 * 2 = 4.
Hence, returns must be list of string with 4 elements.
All columns including columns in DataFrame which do not participate in pivot must be specified if all_columns is set to True. - If columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]}, aggfuncs=[df.sales.sum(), df.cogs.sum()] and limit_combination=False, then number of new columns in output will be: len(aggfuncs) * len(columns[df.year]) * len(columns[df.qtr]) or 2 * 2 * 2 = 8.
- all_columns: Specifies whether returns argument must include the names of only pivot columns or all columns.
- When set to True, all columns including columns in DataFrame which do not participate in pivot must be specified.
- When set to False, only output columns excluding columns in DataFrame which do not participate in pivot must be specified.
This is the default value.
- kwargs: Specifies the keyword argument to accept column name and column values as named arguments.
For example: col1=df.year, col1_value=2001, col2=df.qtr, col2_value=['Q1', 'Q2', 'Q3']
- Either use columns argument or keyword arguments.
- Format for column name arguments must be col1, col2,.. colN with values of type ColumnExpression.
- Format for column value argument must be col1_value, col2_value,.. colN_value.
Example Setup
# Create a teradataml DataFrame. >>> load_example_data("teradataml", "star1") >>> df = DataFrame("star1")
Example 1: Pivot data using all values in column 'qtr' and aggregate using sum of column 'sales'
>>> pivot_df = df.pivot(columns=df.qtr, aggfuncs=df.sales.sum())
>>> print(pivot_df) country state yr cogs sum_sales_q2 sum_sales_q1 sum_sales_q3 0 USA NY 2001 25.0 NaN 45.0 NaN 1 CANADA ON 2001 0.0 10.0 NaN NaN 2 CANADA BC 2001 0.0 NaN NaN 10.0 3 USA CA 2002 20.0 50.0 NaN NaN 4 USA CA 2002 15.0 NaN 30.0 NaN
Example 2: Pivot data using columns 'yr' and 'qtr', aggregate using sum of 'sales' and median of 'cogs' column
Limit combination of 'q1' with '2001' and 'q2' with '2002'.
>>> pivot_df = df.pivot(columns={df.yr: [2001, 2002], df.qtr: ['Q1', 'Q2']}, aggfuncs=[df.sales.sum(), df.cogs.median()], limit_combinations=True)
>>> print(pivot_df) country state sum_sales_2001_q1 median_cogs_2001_q1 sum_sales_2002_q2 median_cogs_2002_q2 0 CANADA ON NaN NaN NaN NaN 1 USA CA NaN NaN 50.0 20.0 2 USA NY 45.0 25.0 NaN NaN 3 CANADA BC NaN NaN NaN NaN
Example 3: Get all possible column combinations
Get all possible column combinations when pivoting using 'yr' and 'qtr', aggregation using 'sales' and 'cogs' column.
>>> pivot_df = df.pivot(columns={df.yr: [2001, 2002], df.qtr: ['q1', 'q2']}, aggfuncs=[df.sales.sum(), df.cogs.max()])
>>> print(pivot_df) country state sum_sales_2001_q1 max_cogs_2001_q1 sum_sales_2001_q2 max_cogs_2001_q2 sum_sales_2002_q1 max_cogs_2002_q1 sum_sales_2002_q2 max_cogs_2002_q2 0 CANADA BC NaN NaN NaN NaN NaN NaN NaN NaN 1 CANADA ON NaN NaN 10.0 0.0 NaN NaN NaN NaN 2 USA NY 45.0 25.0 NaN NaN NaN NaN NaN NaN 3 USA CA NaN NaN NaN NaN 30.0 15.0 50.0 20.0
Example 4: Customize the name of returned columns using returns argument
>>> pivot_df = df.pivot(columns={df.yr:2001, df.qtr:['Q1', 'Q2']}, aggfuncs=[df.sales.sum(), (2*df.sales-1).max()], returns=["Q1_2001_total_sales", "Q1_2001_total_cogs", "Q2_2001_total_sales", "Q2_2001_total_cogs"])
>>> print(pivot_df) country state cogs Q1_2001_total_sales Q1_2001_total_cogs Q2_2001_total_sales Q2_2001_total_cogs 0 USA NY 25.0 45.0 89.0 NaN NaN 1 CANADA ON 0.0 NaN NaN 10.0 19.0 2 CANADA BC 0.0 NaN NaN NaN NaN 3 USA CA 20.0 NaN NaN NaN NaN 4 USA CA 15.0 NaN NaN NaN NaN
Example 5: Customize the name of all output columns using returns and all_columns arguments
>>> pivot_df = df.pivot(columns={df.yr:2001, df.qtr:['Q1', 'Q2']}, aggfuncs=[df.sales.avg(), df.cogs.median()], returns=["con", "st", "Q1_2001_total_sales", "Q1_2001_total_cogs", "Q2_2001_total_sales", "Q2_2001_total_cogs"], all_columns=True)
>>> print(pivot_df) con st Q1_2001_total_sales Q1_2001_total_cogs Q2_2001_total_sales Q2_2001_total_cogs 0 CANADA ON NaN NaN 10.0 0.0 1 USA CA NaN NaN NaN NaN 2 USA NY 45.0 25.0 NaN NaN 3 CANADA BC NaN NaN NaN NaN
Example 6: Use keyword arguments to specify columns and values instead of columns argument
>>> pivot_df = df.pivot(aggfuncs=[df.sales.avg(), ((2*df.sales)+(df.cogs/2)).min()], col1=df.yr, col1_values=2001, col2=df.qtr, col2_values=['Q1', 'Q2'])
>>> print(pivot_df) country state avg_sales_2001_q1 min_sales_2_cogs_2_2001_q1 avg_sales_2001_q2 min_sales_2_cogs_2_2001_q2 0 CANADA BC NaN NaN NaN NaN 1 CANADA ON NaN NaN 10.0 20.0 2 USA NY 45.0 102.5 NaN NaN 3 USA CA NaN NaN NaN NaN
Example 7: Find the median sales and mean cogs for first three quarters of year 2001 using margins
>>> pivot_df = df.pivot(columns={df.qtr: ['q1', 'q2', 'q3'], df.yr: [2001]}, aggfuncs=df.sales.median(), margins={"SUM": [("median_sales_q1_2001", "median_sales_q2_2001"), ("median_sales_q3_2001", "median_sales_q2_2001")], "AVG": [("median_sales_q1_2001", "median_sales_q2_2001"), ("median_sales_q2_2001", "median_sales_q1_2001")]})
>>> print(pivot_df) country state cogs median_sales_q1_2001 median_sales_q2_2001 median_sales_q3_2001 sum_median_sales_q1_2001_median_sales_q2_2001 sum_median_sales_q3_2001_median_sales_q2_2001 avg_median_sales_q1_2001_median_sales_q2_2001 avg_median_sales_q2_2001_median_sales_q1_2001 0 CANADA BC 0.0 NaN NaN 10.0 NaN 10.0 NaN NaN 1 USA NY 25.0 45.0 NaN NaN 45.0 NaN 45.0 45.0 2 CANADA ON 0.0 NaN 10.0 NaN 10.0 10.0 10.0 10.0 3 USA CA 20.0 NaN NaN NaN NaN NaN NaN NaN 4 USA CA 15.0 NaN NaN NaN NaN NaN NaN NaN
Example 8: Find the min of sales and average of cogs for quarter 'Q1', 'Q2' of year '2001'
Name the aggregated column as 'margins'.
>>> pivot_df = df.pivot(columns={df.qtr: ['q1', 'q2'], df.yr: [2001]}, aggfuncs=[df.cogs.min(), df.sales.avg()], margins={"MIN": ("cogs_min_q12001", "sales_var_samp_q22001")}, all_columns=True, returns=["con", "st", "cogs_min_q12001", "sales_var_samp_q12001", "cogs_min_q22001", "sales_var_samp_q22001", "margins"])
>>> print(pivot_df) con st cogs_min_q12001 sales_var_samp_q12001 cogs_min_q22001 sales_var_samp_q22001 margins 0 CANADA BC NaN NaN NaN NaN NaN 1 CANADA ON NaN NaN 0.0 10.0 10.0 2 USA NY 25.0 45.0 NaN NaN 25.0 3 USA CA NaN NaN NaN NaN NaN
Example 9: Specify a teradataml DataFrame with one column for values of 'qtr' column, and then pivot the data
>>> quarters_df = df.drop(columns=['country', 'state', 'yr', 'sales', 'cogs'])
>>> print(quarters_df) qtr 0 Q1 1 Q2 2 Q3 3 Q2 4 Q1
>>> pivot_df = df.pivot(columns={df.qtr: quarters_df}, aggfuncs=[df.sales.sum(), df.cogs.avg()])
>>> print(pivot_df) country state yr sum_sales_q2 avg_cogs_q2 sum_sales_q1 avg_cogs_q1 sum_sales_q3 avg_cogs_q3 0 CANADA ON 2001 10.0 0.0 NaN NaN NaN NaN 1 CANADA BC 2001 NaN NaN NaN NaN 10.0 0.0 2 USA NY 2001 NaN NaN 45.0 25.0 NaN NaN 3 USA CA 2002 50.0 20.0 30.0 15.0 NaN NaN