Teradata Package for Python Function Reference | 17.10 - std - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.std = std(distinct=False, population=False)
- DESCRIPTION:
Function returns the standard deviation for the non-null data points in
a teradataml DataFrame or ColumnExpression over the specified window.
The standard deviation is the second moment of either a sample or population.
For a population, it is a measure of dispersion from the mean of that population.
For a sample, it is a measure of dispersion from the mean of that sample. The
computation is more conservative for the population standard deviation to minimize
the effect of outliers on the computed value.
When there are fewer than two non-null data points in the sample used for the
computation, the function returns None.
PARAMETERS:
distinct:
Optional Argument.
Specifies a flag that decides whether to consider duplicate values in
a column or not.
Default Values: False
Types: bool
population:
Optional Argument.
Specifies whether to calculate standard deviation on entire population
or not. Set this argument to True only when the data points represent
the complete population. If your data represents only a sample of the
entire population for the column, then set this variable to False,
which computes the sample standard deviation. As the sample size increases,
the values for sample standard deviation and population standard deviation
approach the same number. You should always use the more conservative sample
standard deviation calculation, unless you are absolutely certain that your
data constitutes the entire population.
for the column.
Default Value: False
Types: bool
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a teradataml DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the sample standard deviation for
# the column 'gpa' in a Rolling window, partitioned over
# 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute std() on the Rolling window and attach it to the teradataml DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate operations
# in one single call. In this example, we are executing std() along with
# count() window aggregate operations.
>>> df = admissions_train.assign(std_gpa=window.std(), count_gpa=window.count())
>>> df
masters gpa stats programming admitted count_gpa std_gpa
id
15 yes 4.00 Advanced Advanced 1 3 3.60
16 no 3.70 Advanced Advanced 1 3 3.70
11 no 3.13 Advanced Advanced 1 3 3.13
9 no 3.82 Advanced Advanced 1 3 3.13
19 yes 1.98 Advanced Advanced 0 3 1.98
27 yes 3.96 Advanced Advanced 0 3 1.98
1 yes 3.95 Beginner Beginner 0 1 3.95
34 yes 3.85 Advanced Beginner 0 2 3.85
32 yes 3.46 Advanced Beginner 0 3 3.46
40 yes 3.95 Novice Beginner 0 3 3.46
>>>
# Example 2: Calculate the population standard deviation for all
# the valid columns in teradataml DataFrame, in an Expanding
# window, partitioned over 'programming', and order by 'id' in
# descending order.
# Create an Expanding window on teradataml DataFrame.
>>> window = admissions_train.window(partition_columns="masters",
... order_columns="id",
... sort_ascending=False,
... window_start_point=None,
... window_end_point=0)
>>>
# Execute std() on the Expanding window.
>>> df = window.std(population=True)
>>> df
masters gpa stats programming admitted admitted_std gpa_std id_std
id
38 yes 2.65 Advanced Beginner 1 0.471405 0.571548 0.816497
32 yes 3.46 Advanced Beginner 0 0.400000 0.470421 3.072458
31 yes 3.50 Advanced Beginner 1 0.471405 0.429599 3.496029
30 yes 3.79 Advanced Novice 0 0.451754 0.408267 3.795809
27 yes 3.96 Advanced Advanced 0 0.415740 0.399500 4.422166
26 yes 3.57 Advanced Advanced 1 0.458258 0.379889 4.737088
37 no 3.52 Novice Novice 1 0.000000 0.000000 0.000000
36 no 3.00 Advanced Novice 0 0.500000 0.260000 0.500000
35 no 3.68 Novice Beginner 1 0.471405 0.290287 0.816497
33 no 3.55 Novice Novice 1 0.433013 0.259651 1.479020
>>>
# Example 3: Calculate the sample standard deviation for all the valid
# columns in teradataml DataFrame, which are grouped by 'masters'
# and 'gpa' in a Contracting window, partitioned over 'masters' and
# order by 'masters' with nulls listed last.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns="masters",
... order_columns="masters",
... nulls_first=False,
... window_start_point=-5,
... window_end_point=None)
# Execute std() on the Contracting window.
>>> window.std()
masters gpa gpa_std
0 yes 3.79 0.447269
1 yes 3.50 0.583218
2 yes 3.96 0.559739
3 yes 4.00 0.540462
4 yes 3.90 0.655494
5 yes 2.33 0.631651
6 no 3.52 0.746961
7 no 3.83 0.697233
8 no 3.82 0.688829
9 no 3.55 0.652312
>>>