Teradata Package for Python Function Reference | 17.10 - regr_sxx - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.regr_sxx = regr_sxx(expression)
- DESCRIPTION:
Function returns the sum of the squares of the independent variable
expression for all non-null data pairs of dependent and an independent
variable arguments over the specified window. When function is executed,
"expression" is treated as an independent variable and dependent variable is:
* a ColumnExpression when invoked using a window created on ColumnExpression.
* all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
Note:
When there are fewer than two non-null data point pairs in the
data used for the computation, the function returns None.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a column or name of the column or a
literal representing an independent variable for the regression.
An independent variable is a treatment: something that is varied under
your control to test the behavior of another variable.
Types: ColumnExpression OR int OR float OR str
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Note:
# In the examples here, ColumnExpression is passed as input. User can
# choose to pass column name instead of the ColumnExpression.
# Example 1: Calculate the sum of the squares of column 'gpa' for all
# non-null data pairs with dependent variable as 'admitted',
# in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.admitted.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute regr_sxx() on the Rolling window and attach it to the DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate
# operations in one single call. In this example, we are executing
# regr_sxx() along with count() window aggregate operations.
>>> df = admissions_train.assign(regr_sxx_admitted_gpa=window.regr_sxx(admissions_train.gpa),
... count_gpa=window.count())
>>> df
>>> df
masters gpa stats programming admitted count_gpa regr_sxx_admitted_gpa
id
15 yes 4.00 Advanced Advanced 1 3 8.006667e-02
16 no 3.70 Advanced Advanced 1 3 5.306667e-02
11 no 3.13 Advanced Advanced 1 3 3.604667e-01
9 no 3.82 Advanced Advanced 1 3 2.718000e-01
19 yes 1.98 Advanced Advanced 0 3 1.932800e+00
27 yes 3.96 Advanced Advanced 0 3 2.147467e+00
1 yes 3.95 Beginner Beginner 0 1 -4.796510e-16
34 yes 3.85 Advanced Beginner 0 2 5.000000e-03
32 yes 3.46 Advanced Beginner 0 3 1.340667e-01
40 yes 3.95 Novice Beginner 0 3 1.340667e-01
>>>
# Example 2: Calculate the sum of the squares for all columns as
# dependent variable and 'gpa' as independent variable,
# in an Expanding window, partitioned over 'programming',
# and order by 'id' in descending order.
# Create an Expanding window on DataFrame.
>>> window = admissions_train.window(partition_columns="masters",
... order_columns="id",
... sort_ascending=False,
... window_start_point=None,
... window_end_point=0)
>>>
# Execute regr_sxx() on the Expanding window.
>>> df = window.regr_sxx(admissions_train.gpa)
>>> df
masters gpa stats programming admitted admitted_regr_sxx gpa_regr_sxx id_regr_sxx
id
38 yes 2.65 Advanced Beginner 1 9.800000e-01 9.800000e-01 9.800000e-01
32 yes 3.46 Advanced Beginner 0 1.106480e+00 1.106480e+00 1.106480e+00
31 yes 3.50 Advanced Beginner 1 1.107333e+00 1.107333e+00 1.107333e+00
30 yes 3.79 Advanced Novice 0 1.166771e+00 1.166771e+00 1.166771e+00
27 yes 3.96 Advanced Advanced 0 1.436400e+00 1.436400e+00 1.436400e+00
26 yes 3.57 Advanced Advanced 1 1.443160e+00 1.443160e+00 1.443160e+00
37 no 3.52 Novice Novice 1 -4.891920e-16 -4.891920e-16 -4.891920e-16
36 no 3.00 Advanced Novice 0 1.352000e-01 1.352000e-01 1.352000e-01
35 no 3.68 Novice Beginner 1 2.528000e-01 2.528000e-01 2.528000e-01
33 no 3.55 Novice Novice 1 2.696750e-01 2.696750e-01 2.696750e-01
>>>
# Example 3: Calculate the sum of the squares for all columns with independent
# variable as 'gpa', which are grouped by 'masters' and 'gpa' in a
# Contracting window, partitioned over 'masters' and order by 'masters'
# with nulls listed last.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns="masters",
... order_columns="masters",
... nulls_first=False,
... window_start_point=-5,
... window_end_point=None)
# Execute regr_sxx() on the Contracting window.
>>> window.regr_sxx(admissions_train.gpa)
masters gpa gpa_regr_sxx
0 no 3.71 0.630000
1 no 3.52 0.733410
2 no 3.68 0.734400
3 no 3.83 0.796367
4 no 3.55 1.133143
5 no 3.96 1.133333
6 yes 3.59 1.743133
7 yes 3.95 3.680286
8 yes 3.46 3.976088
9 yes 3.76 3.986600
>>>