Teradata Package for Python Function Reference | 20.00 - var - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.var = var(self, distinct=False, population=False, **kwargs)
- DESCRIPTION:
Returns sample or population variance for values in a column.
* The variance of a population is a measure of dispersion from the
mean of that population.
* The variance of a sample is a measure of dispersion from the mean
of that sample. It is the square of the sample standard deviation.
Note:
1. When there are fewer than two non-null data points in the sample used
for the computation, then var returns None.
2. Null values are not included in the result computation.
3. If data represents only a sample of the entire population for the
columns, Teradata recommends to calculate sample variance,
otherwise calculate population variance.
PARAMETERS:
distinct:
Optional Argument.
Specifies a flag that decides whether to consider duplicate values in
a column or not.
Default Values: False
Types: bool
population:
Optional Argument.
Specifies whether to calculate variance on entire population or not.
Set this argument to True only when the data points represent the complete
population. If your data represents only a sample of the entire population
for the columns, then set this variable to False, which will compute the
sample variance. As the sample size increases, even though the values for
sample variance and population variance approach the same number, but you
should always use the more conservative sample standard deviation calculation,
unless you are absolutely certain that your data constitutes the entire
population for the columns.
Default Value: False
Types: bool
kwargs:
Specifies optional keyword arguments.
RETURNS:
ColumnExpression, also known as, teradataml DataFrameColumn.
NOTES:
* One must use DataFrame.assign() when using the aggregate functions on
ColumnExpression, also known as, teradataml DataFrameColumn.
* One should always use "drop_columns=True" in DataFrame.assign(), while
running the aggregate operation on teradataml DataFrame.
* "drop_columns" argument in DataFrame.assign() is ignored, when aggregate
function is operated on DataFrame.groupby().
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Get the sample variance for values in 'gpa' column.
# Execute var() function using teradataml DataFrameColumn to generate the ColumnExpression.
>>> var_column = admissions_train.gpa.var()
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.assign(True, var_=var_column)
>>> df
var_
0 0.263953
# Example 2: Get the population variance for values in 'gpa' column.
# Execute var() function on teradataml DataFrameColumn to generate the ColumnExpression.
# To calculate population variance we must set population=True.
>>> var_column = admissions_train.gpa.var(population=True)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.assign(True, var_=var_column)
>>> df
var_
0 0.257354
>>>
# Example 3: Get the sample variance for distinct values in 'gpa' column.
# for each level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
# Execute var() function using teradataml DataFrameColumn to generate the ColumnExpression.
>>> var_column = admissions_train.gpa.var(distinct=True)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df=admissions_train.groupby("programming").assign(var_=var_column)
>>> df
programming var_
0 Advanced 0.252421
1 Novice 0.418267
2 Beginner 0.138496
>>>