Teradata Package for Python Function Reference - std - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2021-11-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.sql.DataFrameColumn.std = std(self, distinct=False, population=False, **kwargs): DESCRIPTION: Function to get the sample or population standard deviation for values in a column. The standard deviation is the second moment of a distribution. * For a sample, it is a measure of dispersion from the mean of that sample. * For a population, it is a measure of dispersion from the mean of that population. The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value. Note: 1. When there are fewer than two non-null data points in the sample used for the computation, then std returns None. 2. Null values are not included in the result computation. 3. If data represents only a sample of the entire population for the column, Teradata recommends to calculate sample standard deviation, otherwise calculate population standard deviation. PARAMETERS: distinct: Optional Argument. Specifies a flag that decides whether to consider duplicate values in a column or not. Default Values: False Types: bool population: Optional Argument. Specifies whether to calculate standard deviation on entire population or not. Set this argument to True only when the data points represent the complete population. If your data represents only a sample of the entire population for the column, then set this variable to False, which will compute the sample standard deviation. As the sample size increases, even though the values for sample standard deviation and population standard deviation approach the same number, you should always use the more conservative sample standard deviation calculation, unless you are absolutely certain that your data constitutes the entire population for the column. Default Value: False Types: bool kwargs: Specifies optional keyword arguments. RETURNS: ColumnExpression, also known as, teradataml DataFrameColumn. NOTES: * One must use DataFrame.assign() when using the aggregate functions on ColumnExpression, also known as, teradataml DataFrameColumn. * One should always use "drop_columns=True" in DataFrame.assign(), while running the aggregate operation on teradataml DataFrame. * "drop_columns" argument in DataFrame.assign() is ignored, when aggregate function is operated on DataFrame.groupby(). RAISES: RuntimeError - If column does not support the aggregate operation. EXAMPLES: # Load the data to run the example. >>> load_example_data("dataframe", "admissions_train") >>> # Create a DataFrame on 'admissions_train' table. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Example 1: Get the sample standard deviation for values in 'gpa' column. # Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression. >>> std_column = admissions_train.gpa.std() # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df = admissions_train.assign(True, std_=std_column) >>> df std_ 0 0.513764 >>> # Example 2: Get the population standard deviation for values in 'gpa' column. # Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression. # To calculate population standard deviation we must set population=True. >>> std_column = admissions_train.gpa.std(population=True) # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df = admissions_train.assign(True, std_=std_column) >>> df std_ 0 0.507301 >>> # Example 3: Get the sample standard deviation for distinct values in 'gpa' column # for each level of programming. # Note: # When assign() is run after DataFrame.groupby(), the function ignores # the "drop_columns" argument. # Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression. # We will consider DISTINCT values for the columns while calculating the standard deviation value. >>> std_column = admissions_train.gpa.std(distinct=True) # Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result. >>> df=admissions_train.groupby("programming").assign(std_=std_column) >>> df programming std_ 0 Beginner 0.372151 1 Advanced 0.502415 2 Novice 0.646736 >>>