Teradata Package for Python Function Reference | 17.10 - ParametricTest - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

ParametricTest

Functions
		ParametricTest(data, columns=None, dependent_column=None, equal_variance=False, fallback=False, first_column=None, first_column_values=None, group_columns=None, allow_duplicates=False, paired=False, second_column=None, second_column_values=None, stats_database=None, style='t', probability_threshold=0.05, with_indicator=False, gen_sql_only=False) DESCRIPTION: Parametric tests make assumptions about the data, such as the observations being normally distributed. This can be verified with a test of normality prior to executing a parametric test. Both T-Tests and F-Tests are provided. T-Tests can be either paired or unpaired, while the unpaired T-Tests can be with or without an indicator variable. F-Tests can be 1-way, 2-way or 3-way. 2-way tests can have equal or unequal cell counts (count of rows having a combination of distinct column values), while the 3-way test must have equal cell counts. A 1-way test has 1 independent input column, a 2-way test has 2 independent columns and a 3-way test has 3 independent columns in addition to a dependent "column of interest". PARAMETERS: data: Required Argument. Specifies the input data to run statistical tests. Types: teradataml DataFrame columns: Optional Argument. Specifies the name(s) of the column(s) representing independent variables to be analyzed in a F-Test N-Way with Equal Cell Counts analysis. There can be 1, 2 or 3 columns listed in this parameter. If 2 or 3 columns, cell counts (the count of rows having a combination of distinct column values) should be the same. Types: str OR List of Strings (str) dependent_column: Optional Argument. Specifies the name of the column representing the dependent variable in an F-Test. Types: str equal_variance: Optional Argument. Specifies whether the variance of the two samples (columns) is assumed to be equal. The default assumption is that the variances are not equal. Note: This is available to use with 'T-test'. Default Value: False Types: str fallback: Optional Argument. Specifies whether the FALLBACK is requested as in the output result or not. Default Value: False (Not requested) Types: bool first_column: Optional Argument. Specifies the name of the column representing the first variable to analyze for a T-test. For an F-Test, specifies the name of the column representing the first independent variable in the analysis. Types: str first_column_values: Optional Argument. Required for a 2-way F-Test with Unequal Cell Counts. Specifies a list of the "first_column" values to be included in the analysis. Types: int, float, str OR List of Integers, Floats or Strings group_columns: Optional Argument. Specifies the name(s) of the column(s) for grouping so that a separate result is produced for each value or combination of values in the specified column or columns. Note: This option is not available for an F 2-way analysis. Types: str OR list of Strings (str) allow_duplicates: Optional Argument. Specifies whether duplicates are allowed in the output or not. Default Value: False Types: bool paired: Optional Argument. Specifies whether the first and second column values are matched with each other. When set to True, the mean difference is also analyzed. Note: This is an option for T-Test. Default Value: False Types: bool second_column: Optional Argument. Specifies the name of the column representing the second variable to analyze. If the "with_indicator" argument is set to True, the second column is used to define two analysis categories, one where the second column is negative or zero, and another where the second column is positive. For an F-Test, specifies the name of the column representing the second independent variable in the analysis. Note: Date Type is not allowed to be used for the paired T-Test. Types: str second_column_values: Optional Argument. Required for a 2-way F-Test with Unequal Cell Counts. Specifies a list of the "second_column" values to be included in the analysis. Types: int, float, str OR List of Integers, Floats or Strings stats_database: Optional Argument. Specifies the database where the statistical test metadata tables are installed. If not specified, the source database is searched for these metadata tables. Types: str style: Optional Argument. Specifies the test style. Permitted Values: * 't' - T-Test paired, unpaired or unpaired with indicator variable (second column). * 'fnway' - F-Test N-Way with Equal Cell Counts (1, 2, or 3 columns with same number of cell counts). A cell count is the count of rows having a combination of distinct column values. * 'f2way' - F-Test 2-Way with Unequal Cell Counts (2 columns with possibly different numbers of cell counts). A cell count is the count of rows having a combination of distinct column values. Default Value: 't' Types: str probability_threshold: Optional Argument. Specifies the threshold probability, i.e., "alpha" probability, below which the null hypothesis is rejected. Default Value: 0.05 Types: float with_indicator: Optional Argument. Specifies whether the second column is used to indicate there are two analysis categories: one for the case where the second column is negative or zero, and another when the second column is positive. When this is set to True, then second column is used to indicate the analysis categories. Note: Argument can be used with an un-paired T-Test, i.e., when style is set to 't' and paired is set to 'False'. Default Value: False Types: bool gen_sql_only: Optional Argument. Specifies whether to generate only SQL for the function. When set to True, function SQL is generated, not executed, which can be accessed using show_query() method, otherwise SQL is just executed but not returned. Default Value: False Types: bool RETURNS: An instance of ParametricTest. Output teradataml DataFrames can be accessed using attribute references, such as ParametricTestObj.<attribute_name>. Output teradataml DataFrame attribute name is: result. RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Notes: # 1. To execute Vantage Analytic Library functions, # a. import "valib" object from teradataml. # b. set 'configure.val_install_location' to the database name where Vantage # analytic library functions are installed. # 2. Datasets used in these examples can be loaded using Vantage Analytic Library # installer. # 3. The Statistical Test metadata tables must be loaded into the database where # Analytics Library is installed. # Import valib object from teradataml to execute this function. from teradataml import valib # Set the 'configure.val_install_location' variable. from teradataml import configure configure.val_install_location = "SYSLIB" # Create required teradataml DataFrames. custanly = DataFrame("customer_analysis") print(custanly) cust = DataFrame("customer") print(cust) # Example 1: Perform T-Test with default values. obj = valib.ParametricTest(data=custanly, first_column="avg_cc_bal", second_column="avg_sv_bal", paired=True, equal_variance=True, group_columns=["age", "gender"]) # Print the results. print(obj.result) # Example 2: Perform One way F-Test. obj = valib.ParametricTest(data=cust, style="fnway", dependent_column="income", columns="gender", probability_threshold=0.01, group_columns=["years_with_bank", "nbr_children"]) # Print the results. print(obj.result) # Example 3: Perform a 2-way F-Test with Unequal Cell Counts. obj = valib.ParametricTest(data=cust, style="f2way", dependent_column="income", first_column="years_with_bank", first_column_values=[0, 1, 2, 3, 4, 5, 6, 7], second_column="marital_status", second_column_values=[1, 2, 3, 4], probability_threshold=0.05) # Print the results. print(obj.result) # Example 4: Generate only SQL for the function, but do not execute the same. obj = valib.ParametricTest(data=df, first_column="years_with_bank", second_column="marital_status", paired=True, equal_variance=True, group_columns=["years_with_bank", "nbr_children"], gen_sql_only=True) # Print the generated SQL. print(obj.show_query("sql")) # Print both generated SQL and stored procedure call. print(obj.show_query("both")) # Print the stored procedure call. print(obj.show_query()) print(obj.show_query("sp"))