Teradata Package for Python Function Reference | 17.10 - var - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.var = var(distinct=False, population=False)
- DESCRIPTION:
Function returns the variance for the data points in a teradataml
DataFrame or ColumnExpression over the specified window.
By default calculates the variance of sample. Variance of a
sample is a measure of dispersion from the mean of that sample.
It is the square of the sample standard deviation. However,
if parameter "population" is True, then the function calculates
the variance of a population. Variance of a population is a measure
of dispersion from the mean of that population.
The computation is more conservative than that for the population
standard deviation to minimize the effect of outliers on the computed
value. When the sample used for the computation has fewer than two non-null
data points, the function returns None.
PARAMETERS:
distinct:
Optional Argument.
Specifies a flag that decides whether to consider duplicate values in
a column or not.
Default Values: False
Types: bool
population:
Optional Argument.
Specifies whether to calculate variance on entire population or not.
Set this argument to True only when the data points represent the complete
population. If your data represents only a sample of the entire population
for the columns, then set this variable to False, which computes the
sample variance. As the sample size increases, the values for sample
variance and population variance approach the same number. You should
always use the more conservative sample standard deviation calculation,
unless you are absolutely certain that your data constitutes the entire
population for the columns.
Default Value: False
Types: bool
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a teradataml DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the sample variance for the column 'gpa'
# in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute var() on the Rolling window and attach it to the teradataml DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate operations
# in one single call. In this example, we are executing var() along with
# max() window aggregate operations.
>>> df = admissions_train.assign(var_gpa=window.var(), max_gpa=window.max())
>>> df
masters gpa stats programming admitted max_gpa var_gpa
id
15 yes 4.00 Advanced Advanced 1 4.00 0.040033
16 no 3.70 Advanced Advanced 1 4.00 0.026533
11 no 3.13 Advanced Advanced 1 3.96 0.180233
9 no 3.82 Advanced Advanced 1 3.82 0.135900
19 yes 1.98 Advanced Advanced 0 3.82 0.966400
27 yes 3.96 Advanced Advanced 0 3.96 1.073733
1 yes 3.95 Beginner Beginner 0 3.95 NaN
34 yes 3.85 Advanced Beginner 0 3.95 0.005000
32 yes 3.46 Advanced Beginner 0 3.95 0.067033
40 yes 3.95 Novice Beginner 0 3.95 0.067033
>>>
# Example 2: Calculate the population variance for all valid columns
# in teradataml DataFrame, in an Expanding window, partitioned
# over 'programming'.
# Create an Expanding window on teradataml DataFrame.
>>> window = admissions_train.window(partition_columns="programming",
... window_start_point=None,
... window_end_point=0)
>>>
# Execute var() on the Expanding window.
>>> df = window.var(population=True)
>>> df
masters gpa stats programming admitted admitted_var gpa_var id_var
id
4 yes 3.50 Beginner Novice 1 0.222222 0.034022 1.555556
7 yes 2.33 Novice Novice 1 0.240000 0.319336 5.200000
14 yes 3.45 Advanced Advanced 0 0.250000 0.266358 18.222222
15 yes 4.00 Advanced Advanced 1 0.244898 0.270212 26.285714
19 yes 1.98 Advanced Advanced 0 0.246914 0.459180 43.358025
20 yes 3.90 Advanced Advanced 1 0.240000 0.439076 48.840000
3 no 3.70 Novice Beginner 1 0.000000 0.000000 0.000000
5 no 3.44 Novice Novice 0 0.250000 0.016900 1.000000
8 no 3.60 Beginner Advanced 1 0.222222 0.011467 4.222222
9 no 3.82 Advanced Advanced 1 0.187500 0.019400 5.687500
>>>
# Example 3: Calculate the sample of variance for all the valid columns
# in teradataml DataFrame, which are grouped by 'masters' and
# 'gpa' in a Contracting window, partitioned over 'masters'.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns="masters",
... window_start_point=-5,
... window_end_point=None)
# Execute var() on the Contracting window.
>>> window.var()
masters gpa gpa_var
0 yes 3.79 0.200050
1 yes 3.50 0.340143
2 yes 3.96 0.313307
3 yes 4.00 0.292099
4 yes 3.90 0.429673
5 yes 2.33 0.398983
6 no 3.52 0.557950
7 no 3.83 0.486133
8 no 3.82 0.474486
9 no 3.55 0.425511
>>>