| |
- KSTest(data, dependent_column=None, columns=None, fallback=False, group_columns=None, allow_duplicates=False, stats_database=None, style='ks', probability_threshold=0.05, gen_sql_only=False)
- DESCRIPTION:
Statistical tests of this type attempt to determine the likelihood that two
distribution functions represent the same distribution.
PARAMETERS:
data:
Required Argument.
Specifies the input data to run statistical tests.
Types: teradataml DataFrame
dependent_column:
Required Argument.
Specifies the name of the numeric column that is tested to have a normal distribution.
Types: str
columns:
Optional Argument.
Specifies a categorical variable with two values that indicate the distribution
to which the "dependent_column" belongs.
Note:
Used only by the Smirnov test.
Types: str OR list of Strings (str)
fallback:
Optional Argument.
Specifies whether the FALLBACK is requested as in the output result or not.
Default Value: False (Not requested)
Types: bool
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) for grouping so that a separate result
is produced for each value or combination of values in the specified column or
columns.
Types: str OR list of Strings (str)
allow_duplicates:
Optional Argument.
Specifies whether duplicates are allowed in the output or not.
Default Value: False
Types: bool
stats_database:
Optional Argument.
Specifies the database where the statistical test metadata tables are installed.
If not specified, the source database is searched for these metadata tables.
Types: str
style:
Optional Argument.
Specifies the test style.
Permitted Values:
* 'ks' - Kolmogorov-Smirnov test.
* 'l' - Lilliefors test.
* 'sw' - Shapiro-Wilk test.
* 'p' - D'Agostino and Pearson test.
* 's' - Smirnov test.
Default Value: 'ks'
Types: str
probability_threshold:
Optional Argument.
Specifies the threshold probability, i.e., "alpha" probability, below which
the null hypothesis is rejected.
Default Value: 0.05
Types: float
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of KSTest.
Output teradataml DataFrames can be accessed using attribute references, such as
KSTestObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# 3. The Statistical Test metadata tables must be loaded into the database where
# Analytics Library is installed.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
custanly = DataFrame("customer_analysis")
print(custanly)
# Example 1: A Kolmogorov-Smirnov test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="ks")
# Print the results.
print(obj.result)
# Example 2: A Lilliefors test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="l")
# Print the results.
print(obj.result)
# Example 3: A Shapiro-Wilk test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="sw")
# Print the results.
print(obj.result)
# Example 4: A D'Agostino and Pearson test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="p")
# Print the results.
print(obj.result)
# Example 5: A Smirnov test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
columns="gender",
group_columns="years_with_bank",
style="s")
# Print the results.
print(obj.result)
# Example 6: Generate only SQL for the function, but do not execute the same.
obj = valib.KSTest(data=df,
dependent_column="income",
group_columns="years_with_bank",
style="sw",
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|