| |
Methods defined here:
- __init__(self, data=None, target_columns=None, exclude_columns=None, statistics=None, partition_columns=None, data_sequence_column=None)
- DESCRIPTION:
The UnivariateStatistics function calculates descriptive statistics
for a set of target columns.
PARAMETERS:
data:
Required Argument.
Specifies the input teradataml DataFrame that contains columns
to calculate descriptive statistics.
target_columns:
Optional Argument.
Specifies the input teradataml DataFrame columns that contain
numeric values to calculate statistics for.
Types: str OR list of Strings (str)
exclude_columns:
Optional Argument.
Specifies the teradataml DataFrame columns which should be
ignored, the rest of numeric columns in the teradataml
DataFrame will be used as target variables.
Types: str OR list of Strings (str)
statistics:
Optional Argument.
Specifies the groups of statistical measures to include in the
response.
Permitted Values: MOMENTS, BASIC, QUANTILES
Types: str
partition_columns:
Optional Argument.
Specifies the columns which define groups for which statistics
is calculated.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each
row of the input argument "data". The argument is used to
ensure deterministic results for functions which produce
results that vary from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of UnivariateStatistics.
Output teradataml DataFrames can be accessed using attribute
references, such as UnivariateStatisticsObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
1. moments_table
2. basic_table
3. quantiles_table
4. output
When the argument 'statistics' is None, all the four output
teradataml DataFrames are generated. When the argument 'statistics'
is given one of the permitted values (not None), the instance of
UnivariateStatistics has the corresponding output teradataml
DataFrame along with 'output' teradataml DataFrame.
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("univariatestatistics", "finance_data3")
# Provided example table 'finance_data3' contains the columns
# 'expenditure', 'income' and 'investment' for which the below
# examples try to generate descriptive statistics.
# Create teradataml DataFrame objects.
finance_data3 = DataFrame.from_table("finance_data3")
# Example 1 : UnivariateStatistics for all the numeric columns except 'id' and 'period'.
US_out1 = UnivariateStatistics(data = finance_data3, exclude_columns = ["id","period"])
# Print the results
print(US_out1.moments_table) # Prints 'moments_table' teradataml DataFrame.
print(US_out1.basic_table) # Prints 'basic_table' teradataml DataFrame.
print(US_out1.quantiles_table) # Prints 'quantiles_table' teradataml DataFrame.
print(US_out1.output) # Prints 'output' teradataml DataFrame.
# Example 2 : UnivariateStatistics for columns 'expenditure', 'income' and
# 'investment' partitioned by the column 'id'.
US_out2 = UnivariateStatistics(data = finance_data3,partition_columns = ["id"],
target_columns = ["expenditure","income","investment"])
# Print the results
print(US_out2.moments_table) # Prints 'moments_table' teradataml DataFrame.
print(US_out2.basic_table) # Prints 'basic_table' teradataml DataFrame.
print(US_out2.quantiles_table) # Prints 'quantiles_table' teradataml DataFrame.
print(US_out2.output) # Prints 'output' teradataml DataFrame.
# Example 3 : UnivariateStatistics for generating only BASIC statistics for all the
# numeric columns except 'id' and 'period'.
US_out3 = UnivariateStatistics(data = finance_data3, exclude_columns = ["id","period"],
statistics = "BASIC")
# US_out3 doesn't have teradataml DataFrames 'moments_table' and 'output' as the
# 'statistics' argument has only 'BASIC'.
# Print the results
print(US_out3.basic_table) # Prints 'basic_table' teradataml DataFrame.
print(US_out3.output) # Prints 'output' teradataml DataFrame.
- __repr__(self)
- Returns the string representation for a UnivariateStatistics class instance.
|