| |
- ParametricTest(data, columns=None, dependent_column=None, equal_variance=False, fallback=False, first_column=None, first_column_values=None, group_columns=None, allow_duplicates=False, paired=False, second_column=None, second_column_values=None, stats_database=None, style='t', probability_threshold=0.05, with_indicator=False, gen_sql_only=False)
- DESCRIPTION:
Parametric tests make assumptions about the data, such as the observations being
normally distributed. This can be verified with a test of normality prior to executing
a parametric test. Both T-Tests and F-Tests are provided. T-Tests can be either paired
or unpaired, while the unpaired T-Tests can be with or without an indicator variable.
F-Tests can be 1-way, 2-way or 3-way. 2-way tests can have equal or unequal cell
counts (count of rows having a combination of distinct column values), while the 3-way
test must have equal cell counts. A 1-way test has 1 independent input column, a 2-way
test has 2 independent columns and a 3-way test has 3 independent columns in addition
to a dependent "column of interest".
PARAMETERS:
data:
Required Argument.
Specifies the input data to run statistical tests.
Types: teradataml DataFrame
columns:
Optional Argument.
Specifies the name(s) of the column(s) representing independent variables to be
analyzed in a F-Test N-Way with Equal Cell Counts analysis. There can be 1, 2 or
3 columns listed in this parameter. If 2 or 3 columns, cell counts (the count of
rows having a combination of distinct column values) should be the same.
Types: str OR List of Strings (str)
dependent_column:
Optional Argument.
Specifies the name of the column representing the dependent variable in an F-Test.
Types: str
equal_variance:
Optional Argument.
Specifies whether the variance of the two samples (columns) is assumed to be equal.
The default assumption is that the variances are not equal.
Note:
This is available to use with 'T-test'.
Default Value: False
Types: str
fallback:
Optional Argument.
Specifies whether the FALLBACK is requested as in the output result or not.
Default Value: False (Not requested)
Types: bool
first_column:
Optional Argument.
Specifies the name of the column representing the first variable to analyze for a
T-test. For an F-Test, specifies the name of the column representing the first
independent variable in the analysis.
Types: str
first_column_values:
Optional Argument. Required for a 2-way F-Test with Unequal Cell Counts.
Specifies a list of the "first_column" values to be included in the analysis.
Types: int, float, str OR List of Integers, Floats or Strings
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) for grouping so that a separate result
is produced for each value or combination of values in the specified column or
columns.
Note:
This option is not available for an F 2-way analysis.
Types: str OR list of Strings (str)
allow_duplicates:
Optional Argument.
Specifies whether duplicates are allowed in the output or not.
Default Value: False
Types: bool
paired:
Optional Argument.
Specifies whether the first and second column values are matched with each other.
When set to True, the mean difference is also analyzed.
Note:
This is an option for T-Test.
Default Value: False
Types: bool
second_column:
Optional Argument.
Specifies the name of the column representing the second variable to analyze.
If the "with_indicator" argument is set to True, the second column is used to
define two analysis categories, one where the second column is negative or zero,
and another where the second column is positive.
For an F-Test, specifies the name of the column representing the second independent
variable in the analysis.
Note:
Date Type is not allowed to be used for the paired T-Test.
Types: str
second_column_values:
Optional Argument. Required for a 2-way F-Test with Unequal Cell Counts.
Specifies a list of the "second_column" values to be included in the analysis.
Types: int, float, str OR List of Integers, Floats or Strings
stats_database:
Optional Argument.
Specifies the database where the statistical test metadata tables are installed.
If not specified, the source database is searched for these metadata tables.
Types: str
style:
Optional Argument.
Specifies the test style.
Permitted Values:
* 't' - T-Test paired, unpaired or unpaired with indicator variable
(second column).
* 'fnway' - F-Test N-Way with Equal Cell Counts (1, 2, or 3 columns with same
number of cell counts). A cell count is the count of rows having a
combination of distinct column values.
* 'f2way' - F-Test 2-Way with Unequal Cell Counts (2 columns with possibly
different numbers of cell counts). A cell count is the count of rows
having a combination of distinct column values.
Default Value: 't'
Types: str
probability_threshold:
Optional Argument.
Specifies the threshold probability, i.e., "alpha" probability, below which the
null hypothesis is rejected.
Default Value: 0.05
Types: float
with_indicator:
Optional Argument.
Specifies whether the second column is used to indicate there are two analysis
categories: one for the case where the second column is negative or zero, and
another when the second column is positive. When this is set to True, then second
column is used to indicate the analysis categories.
Note:
Argument can be used with an un-paired T-Test, i.e., when style is set to
't' and paired is set to 'False'.
Default Value: False
Types: bool
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of ParametricTest.
Output teradataml DataFrames can be accessed using attribute references, such as
ParametricTestObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# 3. The Statistical Test metadata tables must be loaded into the database where
# Analytics Library is installed.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrames.
custanly = DataFrame("customer_analysis")
print(custanly)
cust = DataFrame("customer")
print(cust)
# Example 1: Perform T-Test with default values.
obj = valib.ParametricTest(data=custanly,
first_column="avg_cc_bal",
second_column="avg_sv_bal",
paired=True,
equal_variance=True,
group_columns=["age", "gender"])
# Print the results.
print(obj.result)
# Example 2: Perform One way F-Test.
obj = valib.ParametricTest(data=cust,
style="fnway",
dependent_column="income",
columns="gender",
probability_threshold=0.01,
group_columns=["years_with_bank", "nbr_children"])
# Print the results.
print(obj.result)
# Example 3: Perform a 2-way F-Test with Unequal Cell Counts.
obj = valib.ParametricTest(data=cust,
style="f2way",
dependent_column="income",
first_column="years_with_bank",
first_column_values=[0, 1, 2, 3, 4, 5, 6, 7],
second_column="marital_status",
second_column_values=[1, 2, 3, 4],
probability_threshold=0.05)
# Print the results.
print(obj.result)
# Example 4: Generate only SQL for the function, but do not execute the same.
obj = valib.ParametricTest(data=df,
first_column="years_with_bank",
second_column="marital_status",
paired=True,
equal_variance=True,
group_columns=["years_with_bank", "nbr_children"],
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|