Teradata Package for Python Function Reference | 17.10 - regr_sxy - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.regr_sxy = regr_sxy(expression)
- DESCRIPTION:
Function returns the sum of the products of the independent variable and the
dependent variable for all non‑null data pairs of the dependent and independent
variable arguments. When function is executed, "expression" is treated as an
independent variable and dependent variable is ColumnExpression.
Note:
When there are fewer than two non-null data point pairs in the
data used for the computation, the function returns None.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a column or name of the column or a
literal representing an independent variable for the regression.
An independent variable something that is varied under your control
to test the behavior of another variable.
Types: ColumnExpression OR int OR float OR str
RETURNS:
ColumnExpression, also known as, teradataml DataFrameColumn.
RAISES:
RuntimeError - If column does not support the aggregate operation.
NOTES:
* One must use DataFrame.assign() when using the aggregate functions on
ColumnExpression, also known as, teradataml DataFrameColumn.
* One should always use "drop_columns=True" in DataFrame.assign(), while
running the aggregate operation on teradataml DataFrame.
* "drop_columns" argument in DataFrame.assign() is ignored, when aggregate
function is operated on DataFrame.groupby().
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the sum of the products of the "gpa" column (independent variable)
# with respect to values in "admitted" column (dependent variable).
# Execute regr_sxy() using teradataml DataFrameColumn to generate the ColumnExpression.
>>> regr_sxy_column = admissions_train.admitted.regr_sxy(admissions_train.gpa)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.assign(True, regr_sxy_=regr_sxy_column)
>>> df
regr_sxy_
0 -0.2155
>>>
# Example 2: Calculate the sum of the products of the "gpa" column (independent variable)
# with respect to values in "admitted" column (dependent variable) for each
# level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
# Execute regr_sxy() using teradataml DataFrameColumn to generate the ColumnExpression.
>>> regr_sxy_column = admissions_train.admitted.regr_sxy(admissions_train.gpa)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.groupby("programming").assign(regr_sxy_=regr_sxy_column)
>>> df
programming regr_sxy_
0 Advanced 1.456875
1 Novice -0.346364
2 Beginner -0.900000
>>>