Teradata Package for Python Function Reference on VantageCloud Lake - regr_intercept - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.regr_intercept = regr_intercept(expression)
- DESCRIPTION:
Function returns the intercept of the univariate linear regression line
through all non-null data pairs of the dependent and independent variable
arguments over the specified window. The intercept is the point at which
the regression line through the non-null data pairs in the sample intersects
the ordinate, or y-axis, of the graph. The plot of the linear regression
on the variables is used to predict the behavior of the dependent variable
from the change in the independent variable. There can be a strong nonlinear
relationship between independent and dependent variables, and the computation
of the simple linear regression between such variable pairs does not reflect
such a relationship. The function considers ColumnExpression as an dependent
variable and "expression" as an independent variable.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a column or name of the column or a
literal representing an independent variable for the regression.
Types: ColumnExpression OR int OR float OR str
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Note:
# In the examples here, ColumnExpression is passed as input. User can
# choose to pass column name instead of the ColumnExpression.
# Example 1: Calculate the intercept of the univariate linear regression line
# through all non-null data pairs of 'gpa' and 'admitted',
# in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute regr_intercept() on the Rolling window and attach it to the DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate
# operations in one single call. In this example, we are executing
# regr_intercept() along with count() window aggregate operations.
>>> df = admissions_train.assign(regr_intercept_gpa=window.regr_intercept(admissions_train.admitted),
... count_gpa=window.count())
>>> df
masters gpa stats programming admitted count_gpa regr_intercept_gpa
id
15 yes 4.00 Advanced Advanced 1 3 NaN
16 no 3.70 Advanced Advanced 1 3 NaN
11 no 3.13 Advanced Advanced 1 3 NaN
9 no 3.82 Advanced Advanced 1 3 NaN
19 yes 1.98 Advanced Advanced 0 3 1.98
27 yes 3.96 Advanced Advanced 0 3 2.97
1 yes 3.95 Beginner Beginner 0 1 NaN
34 yes 3.85 Advanced Beginner 0 2 NaN
32 yes 3.46 Advanced Beginner 0 3 NaN
40 yes 3.95 Novice Beginner 0 3 NaN
>>>
# Example 2: Calculate the intercept of the univariate linear regression
# line through all non-null data pairs for all the applicable
# columns and independent variable as 'admitted',
# in an Expanding window, partitioned over 'programming',
# and order by 'id' in descending order.
# Create an Expanding window on DataFrame.
>>> window = admissions_train.window(partition_columns="masters",
... order_columns=admissions_train.id.desc(),
... window_start_point=None,
... window_end_point=0)
>>>
# Execute regr_intercept() on the Expanding window.
>>> df = window.regr_intercept(admissions_train.admitted)
>>> df
masters gpa stats programming admitted admitted_regr_intercept gpa_regr_intercept id_regr_intercept
id
38 yes 2.65 Advanced Beginner 1 0.0 3.850000 39.50
32 yes 3.46 Advanced Beginner 0 0.0 3.752500 36.25
31 yes 3.50 Advanced Beginner 1 0.0 3.752500 36.25
30 yes 3.79 Advanced Novice 0 0.0 3.760000 35.00
27 yes 3.96 Advanced Advanced 0 0.0 3.822857 33.00
26 yes 3.57 Advanced Advanced 1 0.0 3.822857 33.00
37 no 3.52 Novice Novice 1 NaN NaN NaN
36 no 3.00 Advanced Novice 0 0.0 3.000000 36.00
35 no 3.68 Novice Beginner 1 0.0 3.000000 36.00
33 no 3.55 Novice Novice 1 0.0 3.000000 36.00
>>>
# Example 3: Calculate the the intercept of the univariate linear regression
# line independent variable as 'admitted' and through all non-null
# data pairs as all the applicable columns, which are grouped by
# 'masters', 'gpa' and 'id' in a Contracting window, partitioned over
# 'masters' and order by 'masters' with nulls listed last.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa", "id"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns=group_by_df.masters,
... order_columns=group_by_df.masters.nulls_last(),
... nulls_first=False,
... window_start_point=-5,
... window_end_point=None)
# Execute regr_intercept() on the Contracting window.
>>> window.regr_intercept(admissions_train.gpa)
masters gpa id gpa_regr_intercept id_regr_intercept
0 yes 3.50 4 0.0 6.109916
1 yes 3.79 30 0.0 5.004477
2 yes 3.45 14 0.0 22.411673
3 yes 3.50 31 0.0 30.417318
4 yes 3.50 6 0.0 30.390058
5 yes 3.59 23 0.0 32.804411
6 no 3.65 12 0.0 24.834395
7 no 3.87 21 0.0 24.222350
8 no 3.44 5 0.0 24.850737
9 no 1.87 24 0.0 22.248982
>>>