Teradata Package for Python Function Reference on VantageCloud Lake - regr_sxy - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.regr_sxy = regr_sxy(expression)
- DESCRIPTION:
Function returns the sum of the products of the independent variable and the
dependent variable for all non‑null data pairs of the dependent and independent
variable arguments over the specified window. When function is executed, "expression"
is treated as an independent variable and dependent variable is:
* a ColumnExpression when invoked using a window created on ColumnExpression.
* all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
Note:
When there are fewer than two non-null data point pairs in the
data used for the computation, the function returns None.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a column or name of the column or a
literal representing an independent variable for the regression.
An independent variable is a treatment: something that is varied under
your control to test the behavior of another variable.
Types: ColumnExpression OR int OR float OR str
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Note:
# In the examples here, ColumnExpression is passed as input. User can
# choose to pass column name instead of the ColumnExpression.
# Example 1: Calculate the sum of the products for 'admitted' and 'gpa',
# with 'admitted' as dependent variable and 'gpa' as independent
# variable in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.admitted.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute regr_sxy() on the Rolling window and attach it to the DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate
# operations in one single call. In this example, we are executing
# regr_sxy() along with count() window aggregate operations.
>>> df = admissions_train.assign(regr_sxy_admitted_gpa=window.regr_sxy(admissions_train.gpa),
... count_gpa=window.count())
>>> df
>>> df
masters gpa stats programming admitted count_gpa regr_sxy_admitted_gpa
id
15 yes 4.00 Advanced Advanced 1 3 0.000000
16 no 3.70 Advanced Advanced 1 3 0.000000
11 no 3.13 Advanced Advanced 1 3 0.000000
9 no 3.82 Advanced Advanced 1 3 0.000000
19 yes 1.98 Advanced Advanced 0 3 1.120000
27 yes 3.96 Advanced Advanced 0 3 0.353333
1 yes 3.95 Beginner Beginner 0 1 0.000000
34 yes 3.85 Advanced Beginner 0 2 0.000000
32 yes 3.46 Advanced Beginner 0 3 0.000000
40 yes 3.95 Novice Beginner 0 3 0.000000
>>>
# Example 2: Calculate the sum of the products between 'gpa' as independent
# variable and all other columns as dependent variable,
# in an Expanding window, partitioned over 'programming',
# and order by 'id' in descending order.
# Create an Expanding window on DataFrame.
>>> window = admissions_train.window(partition_columns=admissions_train.masters,
... order_columns=admissions_train.id.desc(),
... window_start_point=None,
... window_end_point=0)
>>>
# Execute regr_sxy() on the Expanding window.
>>> df = window.regr_sxy(admissions_train.gpa)
>>> df
>>> df
masters gpa stats programming admitted admitted_regr_sxy gpa_regr_sxy id_regr_sxy
id
38 yes 2.65 Advanced Beginner 1 -0.800000 9.800000e-01 1.300000e+00
32 yes 3.46 Advanced Beginner 0 -0.882000 1.106480e+00 2.140000e-01
31 yes 3.50 Advanced Beginner 1 -0.903333 1.107333e+00 3.633333e-01
30 yes 3.79 Advanced Novice 0 -0.978571 1.166771e+00 -9.157143e-01
27 yes 3.96 Advanced Advanced 0 -1.163333 1.436400e+00 -5.310000e+00
26 yes 3.57 Advanced Advanced 1 -1.224000 1.443160e+00 -4.738000e+00
37 no 3.52 Novice Novice 1 0.000000 -4.891920e-16 8.437695e-15
36 no 3.00 Advanced Novice 0 0.260000 1.352000e-01 2.600000e-01
35 no 3.68 Novice Beginner 1 0.400000 2.528000e-01 -1.600000e-01
33 no 3.55 Novice Novice 1 0.437500 2.696750e-01 -4.975000e-01
>>>
# Example 3: Calculate the sum of the products between between column 'gpa'
# and all other columns as independent variable, which are grouped
# by 'masters', 'gpa' and 'admitted' in a Contracting window, partitioned
# over 'masters' and order by 'masters' with nulls listed last.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa", "admitted"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns=group_by_df.masters,
... order_columns=group_by_df.masters.nulls_last(),
... window_start_point=-5,
... window_end_point=None)
# Execute regr_sxy() on the Contracting window.
>>> window.regr_sxy(admissions_train.gpa)
masters gpa admitted admitted_regr_sxy gpa_regr_sxy
0 yes 3.90 1 1.030000 3.126750
1 yes 3.57 1 0.950000 3.158040
2 yes 3.81 1 0.783636 3.279818
3 yes 4.00 0 0.725000 3.292425
4 yes 3.75 0 0.170000 4.186200
5 yes 3.59 1 -0.021333 4.343093
6 no 3.44 0 0.260000 0.160000
7 no 3.68 1 0.234286 0.187771
8 no 3.96 1 0.218750 0.201287
9 no 3.70 1 0.237778 0.227356
>>>