| |
- ChiSquareTest(data, dependent_column=None, columns=None, fallback=False, first_columns=None, group_columns=None, allow_duplicates=False, second_columns=None, stats_database=None, style='chisq', probability_threshold=0.05, gen_sql_only=False)
- DESCRIPTION:
Statistical tests of this type are based on a matrix of frequencies or counts.
A frequency pattern that is non-random is sought in the matrix. Supported tests
of this type include the following:
* Chi Square Test - Besides a Chi Square value, other measures are computed
in a Chi Square Test, including a Phi Coefficient, Cramer's V,
Likelihood Ratio Chi Square, Continuity-Adjusted Chi Square,
and Contingency Coefficient.
* Median Test - A Median Test is a variation of Chi Square Test wherein samples
are tested to see if their populations have the same median value.
PARAMETERS:
data:
Required Argument.
Specifies the input data to run statistical tests.
Types: teradataml DataFrame
dependent_column:
Optional Argument.
Specifies the name of the numeric column representing dependent variable.
Note:
Used only by the Median Test.
Types: str
columns:
Optional Argument.
Specifies the name(s) of the categorical column(s) representing independent variables.
Note:
Used only by the Median Test.
Types: str OR list of Strings (str)
fallback:
Optional Argument.
Specifies whether the FALLBACK is requested as in the output result or not.
Default Value: False (Not requested)
Types: bool
first_columns:
Optional Argument.
Specifies the name(s) of the column(s) representing the first of variable pairs
for analysis.
Notes:
1. Used only by the Chi Square Test.
2. The number of combinations of "first_columns" and "second_columns" may
not exceed 100.
3. If the product of the number distinct values in these column pairs
exceeds 2000, the analysis of that combination is skipped.
Types: str OR list of Strings (str)
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) for grouping so that a separate result
is produced for each value or combination of values in the specified column or
columns.
Notes:
Argument can only be used for Median Test.
Types: str OR list of Strings (str)
allow_duplicates:
Optional Argument.
Specifies whether duplicates are allowed in the output or not.
Default Value: False
Types: bool
second_columns:
Optional Argument.
Specifies the name(s) of the column(s) representing the second of variable pairs
for analysis.
Notes:
1. Used only by the Chi Square Test.
2. The number of combinations of "first_columns" and "second_columns" may not
exceed 100.
3. If the product of the number distinct values in these column pairs exceeds
2000, the analysis of that combination is skipped.
Types: str OR list of Strings (str)
stats_database:
Optional Argument.
Specifies the database where the statistical test metadata tables are installed.
If not specified, the source database is searched for these metadata tables.
Types: str
style:
Optional Argument.
Specifies the test style.
Permitted Values:
* 'chisq' - Chi Square test.
* 'median' - Median test.
Default Value: 'chisq'
Types: str
probability_threshold:
Optional Argument.
Specifies the threshold probability, i.e., "alpha" probability, below which the
null hypothesis is rejected.
Default Value: 0.05
Types: float
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of ChiSquareTest.
Output teradataml DataFrames can be accessed using attribute references, such as
ChiSquareTestObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# 3. The Statistical Test metadata tables must be loaded into the database where
# Analytics Library is installed.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
custanly = DataFrame("customer_analysis")
print(custanly)
# Example 1: Shows a Chi Square test execution.
obj = valib.ChiSquareTest(data= custanly,
first_columns=["female", "single"],
second_columns=["svacct", "ccacct", "ckacct"], style="chisq")
# Print the results.
print(obj.result)
# Example 2: Shows a Median test execution with group-by option.
obj = valib.ChiSquareTest(data= custanly,
dependent_column="income",
columns="marital_status",
group_columns="years_with_bank",
style="median",
probability_threshold=0.01)
# Print the results.
print(obj.result)
# Example 3: Generate only SQL for the function, but do not execute the same.
obj = valib.ChiSquareTest(data= df,
first_columns=["female", "single"],
second_columns=["svacct", "ccacct", "ckacct"],
style="chisq",
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|