Teradata Package for Python Function Reference | 17.10 - RankTest - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

RankTest

Functions
		RankTest(data, block_column=None, columns=None, dependent_column=None, fallback=False, first_column=None, group_columns=None, include_zero=False, independent=False, allow_duplicates=False, second_column=None, single_tail=False, stats_database=None, style='mw', probability_threshold=0.05, treatment_column=None, gen_sql_only=False) DESCRIPTION: Statistical tests of this type calculate statistics based on the rank of variables rather than variable values. In general, data that are ranked and ordinal may be analyzed by these tests. Within some restraints, either numeric or non-numeric data may be analyzed. Supported rank tests include the following: * Mann-Whitney/Kruskal-Wallis Test * Wilcoxon Signed Ranks Test * Friedman Test with Kendall's Coefficient of Concordance & Spearmans' Rho The choice between the Mann-Whitney and Kruskal-Wallis tests is made automatically, looking at the number of distinct values of the independent variable. A variation of the Mann-Whitney test considers each requested variable individually, rather than combined, performing a series of independent tests. PARAMETERS: data: Required Argument. Specifies the input data to run statistical tests. Types: teradataml DataFrame block_column: Optional Argument. Specifies the name of the column representing blocks. Notes: 1. Used only by the Friedman test. 2. When pairing treatment and block column values, a division by zero error can occur if unequal cell counts are found. Types: str dependent_column: Optional Argument. Specifies the name of the column representing the dependent variable. If non-numeric, it will be ranked alphanumerically. Note: Used only by the Mann-Whitney and Friedman tests. Types: str columns: Optional Argument. Specifies the name(s) of the categorical column(s) representing independent variables. Note: Used only by the Mann-Whitney test. Types: str OR list of Strings (str) fallback: Optional Argument. Specifies whether the FALLBACK is requested as in the output result or not. Default Value: False (Not requested) Types: bool first_column: Optional Argument. Specifies the name of the column that represents the first sample variable. Note: Used only by the Wilcoxon test. Types: str group_columns: Optional Argument. Specifies the name(s) of the column(s) for grouping so that a separate result is produced for each value or combination of values in the specified column or columns. Types: str OR list of Strings (str) include_zero: Optional Argument. Specifies whether to discard cases with zero differences or not. Ordinarily, the Wilcoxon test discards cases with zero differences. When set to true, includes these cases with the positive count. Note: Used only by the Wilcoxon test. Default Value: False Types: bool independent: Optional Argument. Specifies whether variation of the Mann-Whitney test should be performed or not. When set to true, Mann-Whitney test variation is performed considering each requested variable individually, rather than in combination, performing a series of independent tests. Note: Used only by the Mann-Whitney test. Default Value: False Types: bool allow_duplicates: Optional Argument. Specifies whether duplicates are allowed in the output or not. Default Value: False Types: bool second_column: Optional Argument. Specifies the name of the column that represents the second sample variable. Note: Used only by the Wilcoxon test. Types: str single_tail: Optional Argument. Specifies whether to request single-tailed test or not. When ‘True’, a single-tailed test is requested. Otherwise, a two-tailed test is requested. Notes: 1. Used only by the Mann-Whitney and Wilcoxon tests. 2. If the Mann-Whitney test becomes a Kruskall-Wallis test, the single_tail option is invalid. Default Value: False Types: bool stats_database: Optional Argument. Specifies the database where the statistical test metadata tables are installed. If not specified, the source database is searched for these metadata tables. Types: str style: Optional Argument. Specifies the test style. Permitted Values: * 'mw' - Mann-Whitney test. * 'friedman' - Friedman test. * 'wilcoxon' - Wilcoxon test. Default Value: 'mw' Types: str probability_threshold: Optional Argument. Specifies the threshold probability, i.e., "alpha" probability, below which the null hypothesis is rejected. Default Value: 0.05 Types: float treatment_column: Optional Argument. Specifies the name of the column representing the independent categorical variable. Notes: 1. Used only by the Friedman test. 2. When pairing treatment and block column values, a division by zero error can occur if unequal cell counts are found. Types: str gen_sql_only: Optional Argument. Specifies whether to generate only SQL for the function. When set to True, function SQL is generated, not executed, which can be accessed using show_query() method, otherwise SQL is just executed but not returned. Default Value: False Types: bool RETURNS: An instance of RankTest. Output teradataml DataFrames can be accessed using attribute references, such as RankTestObj.<attribute_name>. Output teradataml DataFrame attribute name is: result. RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Notes: # 1. To execute Vantage Analytic Library functions, # a. import "valib" object from teradataml. # b. set 'configure.val_install_location' to the database name where Vantage # analytic library functions are installed. # 2. Datasets used in these examples can be loaded using Vantage Analytic Library # installer. # 3. The Statistical Test metadata tables must be loaded into the database where # Analytics Library is installed. # Import valib object from teradataml to execute this function. from teradataml import valib # Set the 'configure.val_install_location' variable. from teradataml import configure configure.val_install_location = "SYSLIB" # Create required teradataml DataFrames. custanly = DataFrame("customer_analysis") print(custanly) cust = DataFrame("customer") print(cust) # Example 1: Shows the parameters for a Mann-Whitney test with a threshold # probability of 0.01. obj = valib.RankTest(data= cust, dependent_column="income", columns="gender", group_columns="years_with_bank", probability_threshold=0.01, style="mw") # Print the results. print(obj.result) # Example 2: Shows the parameters for a set of Mann-Whitney independent tests. # The threshold probability assumes the default value of 0.05. obj = valib.RankTest(data= custanly, dependent_column="income", columns=["gender", "ccacct", "svacct"], style="mw") # Print the results. print(obj.result) # Example 3: Shows the parameters for a Wilcoxon Test. obj = valib.RankTest(data= custanly, first_column="avg_ck_bal", second_column="avg_sv_bal", group_columns="years_with_bank", style=" wilcoxon") # Print the results. print(obj.result) # Example 4: Shows the parameters for a Friedman Test using a specially prepared # input table called Val_Friedman_WorkTable. # Prepare data for test style "friedman" as per example in VAL user guide. # The "Friedman" style need same number of rows for each combination treatment_column # and "block_column". # Let's get the smallest count of value combinations in the gender and marital_status # columns from custanly DataFrame. min_val = custanly.groupby(["marital_status", "gender"])\ .agg({'cust_id': "count"}).select("count_cust_id").min().squeeze().item() df_cr = custanly.select(["cust_id", "gender", "marital_status", "income", "ckacct", "svacct"]) case_when_then_dict = {(df_cr.gender=="F") & (df_cr.marital_status=="1") : min_val, (df_cr.gender=="F") & (df_cr.marital_status=="2") : min_val, (df_cr.gender=="F") & (df_cr.marital_status=="3") : min_val, (df_cr.gender=="F") & (df_cr.marital_status=="4") : min_val, (df_cr.gender=="M") & (df_cr.marital_status=="1") : min_val, (df_cr.gender=="M") & (df_cr.marital_status=="2") : min_val, (df_cr.gender=="M") & (df_cr.marital_status=="3") : min_val, (df_cr.gender=="M") & (df_cr.marital_status=="4") : min_val} df_fried = df_cr.sample(case_when_then=case_when_then_dict) # Execute the RankTest() function. obj = valib.RankTest(data=df_fried, style="friedman", dependent_column="income", block_column="marital_status", treatment_column="gender") # Print the results. print(obj.result) # Example 5: Generate only SQL for the function, but do not execute the same. obj = valib.RankTest(data= df, probability_threshold=0.01, first_column="avg_ck_bal", second_column="avg_sv_bal", group_columns="years_with_bank", style="wilcoxon", gen_sql_only=True) # Print the generated SQL. print(obj.show_query("sql")) # Print both generated SQL and stored procedure call. print(obj.show_query("both")) # Print the stored procedure call. print(obj.show_query()) print(obj.show_query("sp"))