Teradata Package for Python Function Reference | 17.10 - std - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.window.std = std(distinct=False, population=False): DESCRIPTION: Function returns the standard deviation for the non-null data points in a teradataml DataFrame or ColumnExpression over the specified window. The standard deviation is the second moment of either a sample or population. For a population, it is a measure of dispersion from the mean of that population. For a sample, it is a measure of dispersion from the mean of that sample. The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value. When there are fewer than two non-null data points in the sample used for the computation, the function returns None. PARAMETERS: distinct: Optional Argument. Specifies a flag that decides whether to consider duplicate values in a column or not. Default Values: False Types: bool population: Optional Argument. Specifies whether to calculate standard deviation on entire population or not. Set this argument to True only when the data points represent the complete population. If your data represents only a sample of the entire population for the column, then set this variable to False, which computes the sample standard deviation. As the sample size increases, the values for sample standard deviation and population standard deviation approach the same number. You should always use the more conservative sample standard deviation calculation, unless you are absolutely certain that your data constitutes the entire population. for the column. Default Value: False Types: bool RETURNS: * teradataml DataFrame - When aggregate is executed using window created on teradataml DataFrame. * ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is executed using window created on ColumnExpression. RAISES: RuntimeError - If column does not support the aggregate operation. EXAMPLES: # Load the data to run the example. >>> load_example_data("dataframe", "admissions_train") >>> # Create a teradataml DataFrame on 'admissions_train' table. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Example 1: Calculate the sample standard deviation for # the column 'gpa' in a Rolling window, partitioned over # 'programming'. # Create a Rolling window on 'gpa'. >>> window = admissions_train.gpa.window(partition_columns="programming", ... window_start_point=-2, ... window_end_point=0) >>> # Execute std() on the Rolling window and attach it to the teradataml DataFrame. # Note: DataFrame.assign() allows combining multiple window aggregate operations # in one single call. In this example, we are executing std() along with # count() window aggregate operations. >>> df = admissions_train.assign(std_gpa=window.std(), count_gpa=window.count()) >>> df masters gpa stats programming admitted count_gpa std_gpa id 15 yes 4.00 Advanced Advanced 1 3 3.60 16 no 3.70 Advanced Advanced 1 3 3.70 11 no 3.13 Advanced Advanced 1 3 3.13 9 no 3.82 Advanced Advanced 1 3 3.13 19 yes 1.98 Advanced Advanced 0 3 1.98 27 yes 3.96 Advanced Advanced 0 3 1.98 1 yes 3.95 Beginner Beginner 0 1 3.95 34 yes 3.85 Advanced Beginner 0 2 3.85 32 yes 3.46 Advanced Beginner 0 3 3.46 40 yes 3.95 Novice Beginner 0 3 3.46 >>> # Example 2: Calculate the population standard deviation for all # the valid columns in teradataml DataFrame, in an Expanding # window, partitioned over 'programming', and order by 'id' in # descending order. # Create an Expanding window on teradataml DataFrame. >>> window = admissions_train.window(partition_columns="masters", ... order_columns="id", ... sort_ascending=False, ... window_start_point=None, ... window_end_point=0) >>> # Execute std() on the Expanding window. >>> df = window.std(population=True) >>> df masters gpa stats programming admitted admitted_std gpa_std id_std id 38 yes 2.65 Advanced Beginner 1 0.471405 0.571548 0.816497 32 yes 3.46 Advanced Beginner 0 0.400000 0.470421 3.072458 31 yes 3.50 Advanced Beginner 1 0.471405 0.429599 3.496029 30 yes 3.79 Advanced Novice 0 0.451754 0.408267 3.795809 27 yes 3.96 Advanced Advanced 0 0.415740 0.399500 4.422166 26 yes 3.57 Advanced Advanced 1 0.458258 0.379889 4.737088 37 no 3.52 Novice Novice 1 0.000000 0.000000 0.000000 36 no 3.00 Advanced Novice 0 0.500000 0.260000 0.500000 35 no 3.68 Novice Beginner 1 0.471405 0.290287 0.816497 33 no 3.55 Novice Novice 1 0.433013 0.259651 1.479020 >>> # Example 3: Calculate the sample standard deviation for all the valid # columns in teradataml DataFrame, which are grouped by 'masters' # and 'gpa' in a Contracting window, partitioned over 'masters' and # order by 'masters' with nulls listed last. # Perform group_by() operation on teradataml DataFrame. >>> group_by_df = admissions_train.groupby(["masters", "gpa"]) # Create a Contracting window on teradataml DataFrameGroupBy object. >>> window = group_by_df.window(partition_columns="masters", ... order_columns="masters", ... nulls_first=False, ... window_start_point=-5, ... window_end_point=None) # Execute std() on the Contracting window. >>> window.std() masters gpa gpa_std 0 yes 3.79 0.447269 1 yes 3.50 0.583218 2 yes 3.96 0.559739 3 yes 4.00 0.540462 4 yes 3.90 0.655494 5 yes 2.33 0.631651 6 no 3.52 0.746961 7 no 3.83 0.697233 8 no 3.82 0.688829 9 no 3.55 0.652312 >>>