Teradata Package for Python Function Reference | 17.10 - var - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.sql.DataFrameColumn.var = var(self, distinct=False, population=False, **kwargs): DESCRIPTION: Returns sample or population variance for values in a column. * The variance of a population is a measure of dispersion from the mean of that population. * The variance of a sample is a measure of dispersion from the mean of that sample. It is the square of the sample standard deviation. Note: 1. When there are fewer than two non-null data points in the sample used for the computation, then var returns None. 2. Null values are not included in the result computation. 3. If data represents only a sample of the entire population for the columns, Teradata recommends to calculate sample variance, otherwise calculate population variance. PARAMETERS: distinct: Optional Argument. Specifies a flag that decides whether to consider duplicate values in a column or not. Default Values: False Types: bool population: Optional Argument. Specifies whether to calculate variance on entire population or not. Set this argument to True only when the data points represent the complete population. If your data represents only a sample of the entire population for the columns, then set this variable to False, which will compute the sample variance. As the sample size increases, even though the values for sample variance and population variance approach the same number, but you should always use the more conservative sample standard deviation calculation, unless you are absolutely certain that your data constitutes the entire population for the columns. Default Value: False Types: bool kwargs: Specifies optional keyword arguments. RETURNS: ColumnExpression, also known as, teradataml DataFrameColumn. NOTES: * One must use DataFrame.assign() when using the aggregate functions on ColumnExpression, also known as, teradataml DataFrameColumn. * One should always use "drop_columns=True" in DataFrame.assign(), while running the aggregate operation on teradataml DataFrame. * "drop_columns" argument in DataFrame.assign() is ignored, when aggregate function is operated on DataFrame.groupby(). RAISES: RuntimeError - If column does not support the aggregate operation. EXAMPLES: # Load the data to run the example. >>> load_example_data("dataframe", "admissions_train") >>> # Create a DataFrame on 'admissions_train' table. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Example 1: Get the sample variance for values in 'gpa' column. # Execute var() function using teradataml DataFrameColumn to generate the ColumnExpression. >>> var_column = admissions_train.gpa.var() # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df = admissions_train.assign(True, var_=var_column) >>> df var_ 0 0.263953 # Example 2: Get the population variance for values in 'gpa' column. # Execute var() function on teradataml DataFrameColumn to generate the ColumnExpression. # To calculate population variance we must set population=True. >>> var_column = admissions_train.gpa.var(population=True) # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df = admissions_train.assign(True, var_=var_column) >>> df var_ 0 0.257354 >>> # Example 3: Get the sample variance for distinct values in 'gpa' column. # for each level of programming. # Note: # When assign() is run after DataFrame.groupby(), the function ignores # the "drop_columns" argument. # Execute var() function using teradataml DataFrameColumn to generate the ColumnExpression. >>> var_column = admissions_train.gpa.var(distinct=True) # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df=admissions_train.groupby("programming").assign(var_=var_column) >>> df programming var_ 0 Advanced 0.252421 1 Novice 0.418267 2 Beginner 0.138496 >>>