| |
- RankTest(data, block_column=None, columns=None, dependent_column=None, fallback=False, first_column=None, group_columns=None, include_zero=False, independent=False, allow_duplicates=False, second_column=None, single_tail=False, stats_database=None, style='mw', probability_threshold=0.05, treatment_column=None, gen_sql_only=False)
- DESCRIPTION:
Statistical tests of this type calculate statistics based on the rank of variables
rather than variable values. In general, data that are ranked and ordinal may be
analyzed by these tests. Within some restraints, either numeric or non-numeric data
may be analyzed. Supported rank tests include the following:
* Mann-Whitney/Kruskal-Wallis Test
* Wilcoxon Signed Ranks Test
* Friedman Test with Kendall's Coefficient of Concordance & Spearmans' Rho
The choice between the Mann-Whitney and Kruskal-Wallis tests is made automatically,
looking at the number of distinct values of the independent variable. A variation of
the Mann-Whitney test considers each requested variable individually, rather than
combined, performing a series of independent tests.
PARAMETERS:
data:
Required Argument.
Specifies the input data to run statistical tests.
Types: teradataml DataFrame
block_column:
Optional Argument.
Specifies the name of the column representing blocks.
Notes:
1. Used only by the Friedman test.
2. When pairing treatment and block column values, a division by zero error
can occur if unequal cell counts are found.
Types: str
dependent_column:
Optional Argument.
Specifies the name of the column representing the dependent variable. If
non-numeric, it will be ranked alphanumerically.
Note:
Used only by the Mann-Whitney and Friedman tests.
Types: str
columns:
Optional Argument.
Specifies the name(s) of the categorical column(s) representing independent variables.
Note:
Used only by the Mann-Whitney test.
Types: str OR list of Strings (str)
fallback:
Optional Argument.
Specifies whether the FALLBACK is requested as in the output result or not.
Default Value: False (Not requested)
Types: bool
first_column:
Optional Argument.
Specifies the name of the column that represents the first sample variable.
Note:
Used only by the Wilcoxon test.
Types: str
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) for grouping so that a separate result
is produced for each value or combination of values in the specified column or
columns.
Types: str OR list of Strings (str)
include_zero:
Optional Argument.
Specifies whether to discard cases with zero differences or not. Ordinarily,
the Wilcoxon test discards cases with zero differences. When set to true,
includes these cases with the positive count.
Note:
Used only by the Wilcoxon test.
Default Value: False
Types: bool
independent:
Optional Argument.
Specifies whether variation of the Mann-Whitney test should be performed or not.
When set to true, Mann-Whitney test variation is performed considering each
requested variable individually, rather than in combination, performing a series
of independent tests.
Note:
Used only by the Mann-Whitney test.
Default Value: False
Types: bool
allow_duplicates:
Optional Argument.
Specifies whether duplicates are allowed in the output or not.
Default Value: False
Types: bool
second_column:
Optional Argument.
Specifies the name of the column that represents the second sample variable.
Note:
Used only by the Wilcoxon test.
Types: str
single_tail:
Optional Argument.
Specifies whether to request single-tailed test or not. When ‘True’, a
single-tailed test is requested. Otherwise, a two-tailed test is requested.
Notes:
1. Used only by the Mann-Whitney and Wilcoxon tests.
2. If the Mann-Whitney test becomes a Kruskall-Wallis test, the single_tail
option is invalid.
Default Value: False
Types: bool
stats_database:
Optional Argument.
Specifies the database where the statistical test metadata tables are installed.
If not specified, the source database is searched for these metadata tables.
Types: str
style:
Optional Argument.
Specifies the test style.
Permitted Values:
* 'mw' - Mann-Whitney test.
* 'friedman' - Friedman test.
* 'wilcoxon' - Wilcoxon test.
Default Value: 'mw'
Types: str
probability_threshold:
Optional Argument.
Specifies the threshold probability, i.e., "alpha" probability, below which
the null hypothesis is rejected.
Default Value: 0.05
Types: float
treatment_column:
Optional Argument.
Specifies the name of the column representing the independent categorical
variable.
Notes:
1. Used only by the Friedman test.
2. When pairing treatment and block column values, a division by zero error
can occur if unequal cell counts are found.
Types: str
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of RankTest.
Output teradataml DataFrames can be accessed using attribute references, such as
RankTestObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# 3. The Statistical Test metadata tables must be loaded into the database where
# Analytics Library is installed.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrames.
custanly = DataFrame("customer_analysis")
print(custanly)
cust = DataFrame("customer")
print(cust)
# Example 1: Shows the parameters for a Mann-Whitney test with a threshold
# probability of 0.01.
obj = valib.RankTest(data= cust,
dependent_column="income",
columns="gender",
group_columns="years_with_bank",
probability_threshold=0.01,
style="mw")
# Print the results.
print(obj.result)
# Example 2: Shows the parameters for a set of Mann-Whitney independent tests.
# The threshold probability assumes the default value of 0.05.
obj = valib.RankTest(data= custanly,
dependent_column="income",
columns=["gender", "ccacct", "svacct"],
style="mw")
# Print the results.
print(obj.result)
# Example 3: Shows the parameters for a Wilcoxon Test.
obj = valib.RankTest(data= custanly,
first_column="avg_ck_bal",
second_column="avg_sv_bal",
group_columns="years_with_bank",
style=" wilcoxon")
# Print the results.
print(obj.result)
# Example 4: Shows the parameters for a Friedman Test using a specially prepared
# input table called Val_Friedman_WorkTable.
# Prepare data for test style "friedman" as per example in VAL user guide.
# The "Friedman" style need same number of rows for each combination treatment_column
# and "block_column".
# Let's get the smallest count of value combinations in the gender and marital_status
# columns from custanly DataFrame.
min_val = custanly.groupby(["marital_status", "gender"])\
.agg({'cust_id': "count"}).select("count_cust_id").min().squeeze().item()
df_cr = custanly.select(["cust_id", "gender", "marital_status", "income", "ckacct",
"svacct"])
case_when_then_dict = {(df_cr.gender=="F") & (df_cr.marital_status=="1") : min_val,
(df_cr.gender=="F") & (df_cr.marital_status=="2") : min_val,
(df_cr.gender=="F") & (df_cr.marital_status=="3") : min_val,
(df_cr.gender=="F") & (df_cr.marital_status=="4") : min_val,
(df_cr.gender=="M") & (df_cr.marital_status=="1") : min_val,
(df_cr.gender=="M") & (df_cr.marital_status=="2") : min_val,
(df_cr.gender=="M") & (df_cr.marital_status=="3") : min_val,
(df_cr.gender=="M") & (df_cr.marital_status=="4") : min_val}
df_fried = df_cr.sample(case_when_then=case_when_then_dict)
# Execute the RankTest() function.
obj = valib.RankTest(data=df_fried,
style="friedman",
dependent_column="income",
block_column="marital_status",
treatment_column="gender")
# Print the results.
print(obj.result)
# Example 5: Generate only SQL for the function, but do not execute the same.
obj = valib.RankTest(data= df,
probability_threshold=0.01,
first_column="avg_ck_bal",
second_column="avg_sv_bal",
group_columns="years_with_bank",
style="wilcoxon",
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|