Teradata Package for Python Function Reference | 17.10 - KSTest - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

KSTest

Functions
		KSTest(data, dependent_column=None, columns=None, fallback=False, group_columns=None, allow_duplicates=False, stats_database=None, style='ks', probability_threshold=0.05, gen_sql_only=False) DESCRIPTION: Statistical tests of this type attempt to determine the likelihood that two distribution functions represent the same distribution. PARAMETERS: data: Required Argument. Specifies the input data to run statistical tests. Types: teradataml DataFrame dependent_column: Required Argument. Specifies the name of the numeric column that is tested to have a normal distribution. Types: str columns: Optional Argument. Specifies a categorical variable with two values that indicate the distribution to which the "dependent_column" belongs. Note: Used only by the Smirnov test. Types: str OR list of Strings (str) fallback: Optional Argument. Specifies whether the FALLBACK is requested as in the output result or not. Default Value: False (Not requested) Types: bool group_columns: Optional Argument. Specifies the name(s) of the column(s) for grouping so that a separate result is produced for each value or combination of values in the specified column or columns. Types: str OR list of Strings (str) allow_duplicates: Optional Argument. Specifies whether duplicates are allowed in the output or not. Default Value: False Types: bool stats_database: Optional Argument. Specifies the database where the statistical test metadata tables are installed. If not specified, the source database is searched for these metadata tables. Types: str style: Optional Argument. Specifies the test style. Permitted Values: * 'ks' - Kolmogorov-Smirnov test. * 'l' - Lilliefors test. * 'sw' - Shapiro-Wilk test. * 'p' - D'Agostino and Pearson test. * 's' - Smirnov test. Default Value: 'ks' Types: str probability_threshold: Optional Argument. Specifies the threshold probability, i.e., "alpha" probability, below which the null hypothesis is rejected. Default Value: 0.05 Types: float gen_sql_only: Optional Argument. Specifies whether to generate only SQL for the function. When set to True, function SQL is generated, not executed, which can be accessed using show_query() method, otherwise SQL is just executed but not returned. Default Value: False Types: bool RETURNS: An instance of KSTest. Output teradataml DataFrames can be accessed using attribute references, such as KSTestObj.<attribute_name>. Output teradataml DataFrame attribute name is: result. RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Notes: # 1. To execute Vantage Analytic Library functions, # a. import "valib" object from teradataml. # b. set 'configure.val_install_location' to the database name where Vantage # analytic library functions are installed. # 2. Datasets used in these examples can be loaded using Vantage Analytic Library # installer. # 3. The Statistical Test metadata tables must be loaded into the database where # Analytics Library is installed. # Import valib object from teradataml to execute this function. from teradataml import valib # Set the 'configure.val_install_location' variable. from teradataml import configure configure.val_install_location = "SYSLIB" # Create required teradataml DataFrame. custanly = DataFrame("customer_analysis") print(custanly) # Example 1: A Kolmogorov-Smirnov test with group-by option. obj = valib.KSTest(data=custanly, dependent_column="income", group_columns="years_with_bank", style="ks") # Print the results. print(obj.result) # Example 2: A Lilliefors test with group-by option. obj = valib.KSTest(data=custanly, dependent_column="income", group_columns="years_with_bank", style="l") # Print the results. print(obj.result) # Example 3: A Shapiro-Wilk test with group-by option. obj = valib.KSTest(data=custanly, dependent_column="income", group_columns="years_with_bank", style="sw") # Print the results. print(obj.result) # Example 4: A D'Agostino and Pearson test with group-by option. obj = valib.KSTest(data=custanly, dependent_column="income", group_columns="years_with_bank", style="p") # Print the results. print(obj.result) # Example 5: A Smirnov test with group-by option. obj = valib.KSTest(data=custanly, dependent_column="income", columns="gender", group_columns="years_with_bank", style="s") # Print the results. print(obj.result) # Example 6: Generate only SQL for the function, but do not execute the same. obj = valib.KSTest(data=df, dependent_column="income", group_columns="years_with_bank", style="sw", gen_sql_only=True) # Print the generated SQL. print(obj.show_query("sql")) # Print both generated SQL and stored procedure call. print(obj.show_query("both")) # Print the stored procedure call. print(obj.show_query()) print(obj.show_query("sp"))