Teradata Package for Python Function Reference | 17.10 - std - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.std = std(self, distinct=False, population=False, **kwargs)
- DESCRIPTION:
Function to get the sample or population standard deviation for values in a column.
The standard deviation is the second moment of a distribution.
* For a sample, it is a measure of dispersion from the mean of that sample.
* For a population, it is a measure of dispersion from the mean of that population.
The computation is more conservative for the population standard deviation
to minimize the effect of outliers on the computed value.
Note:
1. When there are fewer than two non-null data points in the sample used
for the computation, then std returns None.
2. Null values are not included in the result computation.
3. If data represents only a sample of the entire population for the
column, Teradata recommends to calculate sample standard deviation,
otherwise calculate population standard deviation.
PARAMETERS:
distinct:
Optional Argument.
Specifies a flag that decides whether to consider duplicate values in
a column or not.
Default Values: False
Types: bool
population:
Optional Argument.
Specifies whether to calculate standard deviation on entire population or not.
Set this argument to True only when the data points represent the complete
population. If your data represents only a sample of the entire population for the
column, then set this variable to False, which will compute the sample standard
deviation. As the sample size increases, even though the values for sample
standard deviation and population standard deviation approach the same number,
you should always use the more conservative sample standard deviation calculation,
unless you are absolutely certain that your data constitutes the entire population
for the column.
Default Value: False
Types: bool
kwargs:
Specifies optional keyword arguments.
RETURNS:
ColumnExpression, also known as, teradataml DataFrameColumn.
NOTES:
* One must use DataFrame.assign() when using the aggregate functions on
ColumnExpression, also known as, teradataml DataFrameColumn.
* One should always use "drop_columns=True" in DataFrame.assign(), while
running the aggregate operation on teradataml DataFrame.
* "drop_columns" argument in DataFrame.assign() is ignored, when aggregate
function is operated on DataFrame.groupby().
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Get the sample standard deviation for values in 'gpa' column.
# Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression.
>>> std_column = admissions_train.gpa.std()
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.assign(True, std_=std_column)
>>> df
std_
0 0.513764
>>>
# Example 2: Get the population standard deviation for values in 'gpa' column.
# Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression.
# To calculate population standard deviation we must set population=True.
>>> std_column = admissions_train.gpa.std(population=True)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df = admissions_train.assign(True, std_=std_column)
>>> df
std_
0 0.507301
>>>
# Example 3: Get the sample standard deviation for distinct values in 'gpa' column
# for each level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
# Execute std() function on teradataml DataFrameColumn to generate the ColumnExpression.
# We will consider DISTINCT values for the columns while calculating the standard deviation value.
>>> std_column = admissions_train.gpa.std(distinct=True)
# Pass the generated ColumnExpression to DataFrame.assign(), to run and produce the result.
>>> df=admissions_train.groupby("programming").assign(std_=std_column)
>>> df
programming std_
0 Beginner 0.372151
1 Advanced 0.502415
2 Novice 0.646736
>>>