Teradata Package for Python Function Reference | 17.10 - corr - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.corr = corr(expression)
- DESCRIPTION:
Function returns the Sample Pearson product moment correlation coefficient
of its arguments for all non-null data point pairs in a teradataml
DataFrame or ColumnExpression over the specified window.
The Sample Pearson product moment correlation coefficient is a measure of
the linear association between variables. The boundary on the computed
coefficient ranges from -1.00 to +1.00.
Note that high correlation does not imply a causal relationship between
the variables.
The coefficient of correlation between two variables has the following four extreme values:
1. -1.00 : Association between the variables is perfectly linear, but inverse.
As the value in ColumnExpression varies, the value for
"expression" varies identically in the opposite direction.
2. 0 : Association between the variables does not exist and they are considered
to be uncorrelated.
3. +1.00 : Association between the variables is perfectly linear.
As the value in ColumnExpression varies, the value for
"expression" varies identically in the same direction.
4. NULL : Association between the variables cannot be measured because there
are no non-null data point pairs in the data used for the computation.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a numeric column or name of the column or
a numeric literal to be correlated with ColumnExpression.
Types: ColumnExpression OR str OR int OR float
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the correlation for the 'gpa' and 'admitted'
# in a Rolling window, partitioned over programming.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute corr() on the Rolling window and attach it to the teradataml
# DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate operations
# in one single call. In this example, we are executing corr() along with
# count() window aggregate operations.
>>> df = admissions_train.assign(corr_gpa_admitted=window.corr(admissions_train.admitted),
... count_gpa=window.count())
>>> df
masters gpa stats programming admitted corr_gpa_admitted count_gpa
id
11 no 3.13 Advanced Advanced 1 0.927146 3
27 yes 3.96 Advanced Advanced 0 -0.793047 3
26 yes 3.57 Advanced Advanced 1 -0.292306 3
6 yes 3.50 Beginner Advanced 1 -0.989980 3
9 no 3.82 Advanced Advanced 1 NaN 3
25 no 3.96 Advanced Advanced 1 NaN 3
39 yes 3.75 Advanced Beginner 0 NaN 1
31 yes 3.50 Advanced Beginner 1 -1.000000 2
29 yes 4.00 Novice Beginner 0 -0.866025 3
21 no 3.87 Novice Beginner 1 -0.701039 3
>>>
# Example 2: Calculate the correlation between all the valid columns
# and 'gpa' in teradataml DataFrame, in an Expanding window,
# partitioned over 'programming', and order by 'id' in descending
# order.
# Create an Expanding window on teradataml DataFrame.
>>> window = admissions_train.window(partition_columns="masters",
... order_columns="id",
... sort_ascending=False,
... window_start_point=None,
... window_end_point=0)
>>>
# Execute corr() on the Expanding window.
>>> df = window.corr(admissions_train.gpa)
>>> df
masters gpa stats programming admitted admitted_corr gpa_corr id_corr
id
35 no 3.68 Novice Beginner 1 0.974355 1.0 -0.225018
28 no 3.93 Advanced Advanced 1 0.880019 1.0 -0.691040
25 no 3.96 Advanced Advanced 1 0.848441 1.0 -0.768352
24 no 1.87 Advanced Novice 1 0.216553 1.0 0.251225
17 no 3.83 Advanced Advanced 1 0.262403 1.0 -0.084948
16 no 3.70 Advanced Advanced 1 0.271885 1.0 -0.131117
40 yes 3.95 Novice Beginner 0 NaN -1.0 NaN
39 yes 3.75 Advanced Beginner 0 NaN 1.0 1.000000
38 yes 2.65 Advanced Beginner 1 -0.989743 1.0 0.928571
34 yes 3.85 Advanced Beginner 0 -0.990867 1.0 -0.041862
>>>
# Example 3: Calculate the correlation between 'gpa' and all the valid columns
# in teradataml DataFrame, which are grouped by 'masters' and 'gpa'
# in a Contracting window, partitioned over 'masters' and order by
# 'masters' with nulls listed last.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns="masters",
... order_columns="masters",
... nulls_first=False,
... window_start_point=-5,
... window_end_point=None)
# Execute corr() on the Contracting window.
>>> window.corr(admissions_train.gpa)
masters gpa gpa_corr
0 no 3.71 1.0
1 no 3.52 1.0
2 no 3.68 1.0
3 no 3.83 1.0
4 no 3.55 1.0
5 no 3.96 1.0
6 yes 3.59 1.0
7 yes 3.95 1.0
8 yes 3.46 1.0
9 yes 3.76 1.0
>>>