Teradata Package for Python Function Reference | 20.00 - corr - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.corr = corr(expression)
- DESCRIPTION:
Function returns the column-wise Sample Pearson product moment correlation
coefficient of its arguments for all non-null data point pairs.
The Sample Pearson product moment correlation coefficient is a measure of
the linear association between variables. The boundary on the computed
coefficient ranges from -1.00 to +1.00.
Note that high correlation does not imply a causal relationship between
the variables.
The coefficient of correlation between two variables has the following four extreme values:
1. -1.00 : Association between the variables is perfectly linear, but inverse.
As the value in ColumnExpression varies, the value for
"expression" varies identically in the opposite direction.
2. 0 : Association between the variables does not exist and they are considered
to be uncorrelated.
3. +1.00 : Association between the variables is perfectly linear.
As the value in ColumnExpression varies, the value for
"expression" varies identically in the same direction.
4. NULL : Association between the variables cannot be measured because there
are no non-null data point pairs in the data used for the computation.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a numeric column or name of the column or
a numeric literal to be correlated with ColumnExpression.
Types: ColumnExpression OR str OR int OR float
RETURNS:
teradataml DataFrame
RAISES:
RuntimeError - If none of the columns support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the correlation between all the valid columns
# and 'gpa' in teradataml DataFrame.
>>> df = admissions_train.corr(admissions_train.gpa)
>>> df
corr_id corr_gpa corr_admitted
0 -0.013896 1.0 -0.022265
>>>
# Example 2: Calculate the correlation between all the valid columns
# and 'gpa' in teradataml DataFrame, for each level of 'programming'.
>>> df = admissions_train.groupby("programming").corr(admissions_train.gpa)
>>> df
programming corr_id corr_gpa corr_admitted
0 Advanced 0.166604 1.0 0.487737
1 Novice 0.022877 1.0 -0.114656
2 Beginner -0.277338 1.0 -0.417565