Teradata Package for Python Function Reference | 20.00 - pivot - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.pivot = pivot(self, columns=None, aggfuncs=None, limit_combinations=False, margins=None, returns=None, all_columns=False, **kwargs)
- DESCRIPTION:
Rotate data from rows into columns to create easy-to-read DataFrames.
The function is useful for reporting purposes, as it allows to
aggregate and rotate the data.
PARAMETERS:
columns:
Required when keyword arguments are not specified, optional otherwise.
Specifies the column(s) or dictionary of column(s) with distinct column
value(s) used for pivoting the data.
* When specified as a column(s), function automatically extracts
the distinct values for the column.
For example:
columns = df.qtr # or columns = [df.qtr, dr.yr]
Note that, distinct values 'Q1', 'Q2', 'Q3', ... are
automatically extracted from column 'qtr' for pivoting.
* When specified as dictionary:
- key is a column expression representing a column
- value is a literal or list of literals, i.e., values in a column
or a teradataml DataFrame with only one column.
Note:
When value is a teradataml DataFrame, the order of output pivoted columns
are not guarenteed. Hence value is a teradataml DataFrame, teradata recommonds
to not specify "returns" argument so column names are generated properly
according to the order of records.
When dictionary value contains literals then pivot is done based on the each
combination for the values specified for each key, if "limit_combinations" is
set to False, otherwise combinations are restricted based on the index values
of the list.
Notes:
* All the values in dictionary should have same length when "limit_combinations"
is set to True.
Take a look at the examples below to understand:
- columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]}
Pivot is based on all the combinations of values as specified
below for columns 'qtr' and 'yr'.
If "limit_combinations" is set to False.
(Q1, 2001), (Q1, 2002), (Q2, 2001), and (Q2, 2002).
If "limit_combinations" is set to True.
(Q1, 2001) and (Q2, 2002).
- columns={df.qtr: quarter_df, df.year: [2001, 2002]}
Note that, value passed to df.qtr is a teradataml DataFrame
with one column. In such cases, function extracts the distinct
values from the dataframe and Pivot is based on all the combinations
of values or limited combination of values based on argument
"limit_combinations".
* String type values in dictionary are not case sensitive. Refer to example 2
for more details.
Types: ColumnExpression OR list of ColumnExpression or dict
aggfuncs:
Required Argument.
Specifies the column aggregate function(s) to be used for pivoting the data.
For example:
To pivot total of 'sales' column and average of 'cogs' + 'sales' columns
in DataFrame (df), specify "aggfuncs" as:
aggfuncs=[df.sales.sum(), (df.cogs+df.sales).avg()]
Types: ColumnExpression OR list of ColumnExpression
limit_combinations:
Optional Argument.
Specifies whether to limit the number of combinations when "columns" argument
is passed as a dictionary.
When set to True, function limits the combinations one-to-one
based on index of the values in list, hence all dictionary values
should be a list of same length or a single literal.
For example:
df.pivot(columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]},
limit_combinations=True,....)
Pivot will be based on columns 'qtr' and 'yr' for values
(Q1, 2001) and (Q2, 2002) only.
Note:
Argument is ignored when "columns" is a ColumnExpression or a list of ColumnExpressions.
Default Value: False
Types: bool
margins:
Optional Argument.
Specifies the aggregate operation to be performed on output columns.
Aggregate operation to be performed should be passed as dictionary where:
* Key is a string specifying aggregate operation to be performed.
* Value is a tuple or list of tuples specifying the output column names as
string. Columns specified in the tuple are used in aggregate operation
performed.
For example, if for the year 2001 following three aggregate columns are needed
in the output:
1. Sum of sales in Q1 and Q2
2. Sum of sales in Q2 and Q3
3. Average of cogs for first three quarters
"margins" can be sepcified as:
margins={"SUM": [("Q12001_sales", "Q22001_sales"),
("Q22001_sales", "Q32001_sales")],
"AVG": ("Q12001_cogs", "Q22001_cogs", "Q32001_cogs")}
Notes:
* Supported aggregate functions are SUM, AVG, MIN and MAX.
* If "returns" is specified, column names for margins are considered
from returns clause.
Types: dict
returns:
Optional Argument.
Specifies the custom column name(s) to be returned in the output DataFrame.
Notes:
* If not specified, function internally generate the output column name(s).
* Number of column names should be same as columns generated by pivot.
For example:
If columns={df.qtr: ['Q1', 'Q2'], df.year: [2001, 2002]},
aggfuncs=[df.sales.sum(), df.cogs.sum()] and
limit_combination=False, then
number of new columns in output will be:
len(aggfuncs) * len(columns[df.year]) * len(columns[df.qtr])
or 2 * 2 * 2 = 8
Hence, "returns" should be list of string with 8 elements.
If limit_combination is set to 'True' in above example,
number of new columns is output will be:
len(aggfuncs) * (len(columns[df.year]) OR len(columns[df.qtr]))
or or 2 * 2 = 4
Hence, "returns" should be list of string with 4 elements.
* All columns including columns in DataFrame which do not participate in pivot
must be specified if "all_columns" is set to True.
Types: str OR list of Strings (str)
all_columns:
Optional Argument.
Specifies whether "returns" argument should include only the names of pivot
columns or all columns.
When set to True, all columns including columns in DataFrame which do not
participate in pivot must be specified.
When set to False, only output columns excluding columns in DataFrame which do not
participate in pivot must be specified.
Default Value: False
Types: bool
**kwargs:
Specifies the keyword argument to accept column name and column value(s)
as named arguments.
For example:
col1=df.year, col1_value=2001, col2=df.qtr, col2_value=['Q1', 'Q2', 'Q3']
Notes:
* Either use "columns" argument or keyword arguments.
* Format for column name arguments should be col1, col2,.. colN with
values of type ColumnExpression.
* Format for column value argument should be col1_value, col2_value,.. colN_value.
RETURNS:
teradataml DataFrame
Notes:
* The columns which are not participating in pivoting are always aligned to
left most in the output DataFrame.
* Order of pivot columns in output DataFrame follows the same order as the
values specified in argument "columns" or the order in keyword arguments.
* The name of the output columns are according to the name specified in
"returns" argument.
If not specified, column names are generated based on aggregate functions
and values in "columns" or keyword arguments for column values.
For Example:
When columns={df.qtr:["Q1", "Q2"]} and aggfuncs=[df.col1.sum(), df.col2.max()]
Column names are:
'sum_col1_q1', 'max_col2_q1', 'sum_col1_Q2', and 'max_col2_Q2'.
RAISES:
TypeError, ValueError, TeradataMLException
EXAMPLES:
# Create a teradataml DataFrame.
>>> load_example_data("teradataml", "star1")
>>> df = DataFrame("star1")
>>> print(df)
state yr qtr sales cogs
country
USA NY 2001 Q1 45.0 25.0
CANADA ON 2001 Q2 10.0 0.0
CANADA BC 2001 Q3 10.0 0.0
USA CA 2002 Q2 50.0 20.0
USA CA 2002 Q1 30.0 15.0
# Example 1: Pivot data using all values in column 'qtr' and aggregate using
# sum of column 'sales'.
>>> pivot_df = df.pivot(columns=df.qtr,
... aggfuncs=df.sales.sum())
>>> print(pivot_df)
country state yr cogs sum_sales_q2 sum_sales_q1 sum_sales_q3
0 USA NY 2001 25.0 NaN 45.0 NaN
1 CANADA ON 2001 0.0 10.0 NaN NaN
2 CANADA BC 2001 0.0 NaN NaN 10.0
3 USA CA 2002 20.0 50.0 NaN NaN
4 USA CA 2002 15.0 NaN 30.0 NaN
# Example 2: Pivot data using columns 'yr' and 'qtr', aggregate using sum of
# 'sales' and median of 'cogs' column. Limit combination of 'Q1' with '2001' and
# 'Q2' with '2002'.
>>> pivot_df = df.pivot(columns={df.yr: [2001, 2002], df.qtr: ['Q1', 'Q2']},
... aggfuncs=[df.sales.sum(), df.cogs.median()],
... limit_combinations=True)
>>> print(pivot_df)
country state sum_sales_2001_q1 median_cogs_2001_q1 sum_sales_2002_q2 median_cogs_2002_q2
0 CANADA ON NaN NaN NaN NaN
1 USA CA NaN NaN 50.0 20.0
2 USA NY 45.0 25.0 NaN NaN
3 CANADA BC NaN NaN NaN NaN
# Example 3: Get all possible column combinations when pivoting using 'yr'and 'qtr',
# aggreagation using 'sales' and 'cogs' column.
>>> pivot_df = df.pivot(columns={df.yr: [2001, 2002], df.qtr: ['q1', 'q2']},
... aggfuncs=[df.sales.sum(), df.cogs.max()])
>>> print(pivot_df)
country state sum_sales_2001_q1 max_cogs_2001_q1 sum_sales_2001_q2 max_cogs_2001_q2 sum_sales_2002_q1 max_cogs_2002_q1 sum_sales_2002_q2 max_cogs_2002_q2
0 CANADA BC NaN NaN NaN NaN NaN NaN NaN NaN
1 CANADA ON NaN NaN 10.0 0.0 NaN NaN NaN NaN
2 USA NY 45.0 25.0 NaN NaN NaN NaN NaN NaN
3 USA CA NaN NaN NaN NaN 30.0 15.0 50.0 20.0
# Example 4: Custom name the returned columns using "returns" argument.
>>> pivot_df = df.pivot(columns={df.yr:2001, df.qtr:['Q1', 'Q2']},
... aggfuncs=[df.sales.sum(), (2*df.sales-1).max()],
... returns=["Q1_2001_total_sales", "Q1_2001_total_cogs",
... "Q2_2001_total_sales", "Q2_2001_total_cogs"])
>>> print(pivot_df)
country state cogs Q1_2001_total_sales Q1_2001_total_cogs Q2_2001_total_sales Q2_2001_total_cogs
0 USA NY 25.0 45.0 89.0 NaN NaN
1 CANADA ON 0.0 NaN NaN 10.0 19.0
2 CANADA BC 0.0 NaN NaN NaN NaN
3 USA CA 20.0 NaN NaN NaN NaN
4 USA CA 15.0 NaN NaN NaN NaN
# Example 5: Custom name all output columns using "returns" and "all_columns" argument.
>>> pivot_df = df.pivot(columns={df.yr:2001, df.qtr:['Q1', 'Q2']},
... aggfuncs=[df.sales.avg(), df.cogs.median()],
... returns=["con", "st", "Q1_2001_total_sales", "Q1_2001_total_cogs",
... "Q2_2001_total_sales", "Q2_2001_total_cogs"],
... all_columns=True)
>>> print(pivot_df)
con st Q1_2001_total_sales Q1_2001_total_cogs Q2_2001_total_sales Q2_2001_total_cogs
0 CANADA ON NaN NaN 10.0 0.0
1 USA CA NaN NaN NaN NaN
2 USA NY 45.0 25.0 NaN NaN
3 CANADA BC NaN NaN NaN NaN
# Example 6: Use keyword arguments to specify columns and values instead of "columns" argument.
# Note since "returns" is not specifies, column names are function generated.
>>> pivot_df = df.pivot(aggfuncs=[df.sales.avg(), ((2*df.sales)+(df.cogs/2)).min()],
... col1=df.yr,
... col1_values=2001,
... col2=df.qtr,
... col2_values=['Q1', 'Q2'])
>>> print(pivot_df)
country state avg_sales_2001_q1 min_sales_2_cogs_2_2001_q1 avg_sales_2001_q2 min_sales_2_cogs_2_2001_q2
0 CANADA BC NaN NaN NaN NaN
1 CANADA ON NaN NaN 10.0 20.0
2 USA NY 45.0 102.5 NaN NaN
3 USA CA NaN NaN NaN NaN
# Example 7: Find the median sales and mean cogs for first three quarters of
# year 2001 using "margins".
>>> pivot_df = df.pivot(columns={df.qtr: ['q1', 'q2', 'q3'], df.yr: [2001]},
... aggfuncs=df.sales.median(),
... margins={"SUM": [("median_sales_q1_2001", "median_sales_q2_2001"),
... ("median_sales_q3_2001", "median_sales_q2_2001")],
... "AVG": [("median_sales_q1_2001", "median_sales_q2_2001"),
... ("median_sales_q2_2001", "median_sales_q1_2001")]})
>>> print(pivot_df)
country state cogs median_sales_q1_2001 median_sales_q2_2001 median_sales_q3_2001 sum_median_sales_q1_2001_median_sales_q2_2001 sum_median_sales_q3_2001_median_sales_q2_2001 avg_median_sales_q1_2001_median_sales_q2_2001 avg_median_sales_q2_2001_median_sales_q1_2001
0 CANADA BC 0.0 NaN NaN 10.0 NaN 10.0 NaN NaN
1 USA NY 25.0 45.0 NaN NaN 45.0 NaN 45.0 45.0
2 CANADA ON 0.0 NaN 10.0 NaN 10.0 10.0 10.0 10.0
3 USA CA 20.0 NaN NaN NaN NaN NaN NaN NaN
4 USA CA 15.0 NaN NaN NaN NaN NaN NaN NaN
# Example 8: Find the min of sales and average of cogs for year '2001' in
# quater 'Q1', 'Q2'. Name the aggregated column as 'margins'.
>>> pivot_df = df.pivot(columns={df.qtr: ['q1', 'q2'], df.yr: [2001]},
... aggfuncs=[df.cogs.min(), df.sales.avg()],
... margins={"MIN": ("cogs_min_q12001", "sales_var_samp_q22001")},
... all_columns=True,
... returns=["con",
... "st",
... "cogs_min_q12001",
... "sales_var_samp_q12001",
... "cogs_min_q22001",
... "sales_var_samp_q22001",
... "margins"])
>>> print(pivot_df)
con st cogs_min_q12001 sales_var_samp_q12001 cogs_min_q22001 sales_var_samp_q22001 margins
0 CANADA BC NaN NaN NaN NaN NaN
1 CANADA ON NaN NaN 0.0 10.0 10.0
2 USA NY 25.0 45.0 NaN NaN 25.0
3 USA CA NaN NaN NaN NaN NaN
# Example 9: Specify teradataml DataFrame with one column for values of 'qtr' column and pivot
# the data. Note this allows user to implicitly use all the distinct values for
# the pivot operation and not specify those explicitly.
>>> quarters_df = df.drop(columns=['country', 'state', 'yr', 'sales', 'cogs'])
>>> print(quarters_df)
qtr
0 Q1
1 Q2
2 Q3
3 Q2
4 Q1
>>>
>>> pivot_df = df.pivot(columns={df.qtr: quarters_df},
... aggfuncs=[df.sales.sum(), df.cogs.avg()])
>>> print(pivot_df)
country state yr sum_sales_q2 avg_cogs_q2 sum_sales_q1 avg_cogs_q1 sum_sales_q3 avg_cogs_q3
0 CANADA ON 2001 10.0 0.0 NaN NaN NaN NaN
1 CANADA BC 2001 NaN NaN NaN NaN 10.0 0.0
2 USA NY 2001 NaN NaN 45.0 25.0 NaN NaN
3 USA CA 2002 50.0 20.0 30.0 15.0 NaN NaN