| |
- KSTest(data, dependent_column=None, columns=None, fallback=False, group_columns=None, allow_duplicates=False, stats_database=None, style='ks', probability_threshold=0.05)
- DESCRIPTION:
Statistical tests of this type attempt to determine the likelihood that two
distribution functions represent the same distribution.
PARAMETERS:
data:
Required Argument.
Specifies the input data to run statistical tests.
Types: teradataml DataFrame
dependent_column:
Required Argument.
Specifies the name of the numeric column that is tested to have a normal distribution.
Types: str
columns:
Optional Argument.
Specifies a categorical variable with two values that indicate the distribution
to which the "dependent_column" belongs.
Note:
Used only by the Smirnov test.
Types: str OR list of Strings (str)
fallback:
Optional Argument.
Specifies whether the FALLBACK is requested as in the output result or not.
Default Value: False (Not requested)
Types: bool
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) for grouping so that a separate result
is produced for each value or combination of values in the specified column or
columns.
Types: str OR list of Strings (str)
allow_duplicates:
Optional Argument.
Specifies whether duplicates are allowed in the output or not.
Default Value: False
Types: bool
stats_database:
Optional Argument.
Specifies the database where the statistical test metadata tables are installed.
If not specified, the source database is searched for these metadata tables.
Types: str
style:
Optional Argument.
Specifies the test style.
Permitted Values:
* 'ks' - Kolmogorov-Smirnov test.
* 'l' - Lilliefors test.
* 'sw' - Shapiro-Wilk test.
* 'p' - D'Agostino and Pearson test.
* 's' - Smirnov test.
Default Value: 'ks'
Types: str
probability_threshold:
Optional Argument.
Specifies the threshold probability, i.e., "alpha" probability, below which
the null hypothesis is rejected.
Default Value: 0.05
Types: float
RETURNS:
An instance of KSTest.
Output teradataml DataFrames can be accessed using attribute references, such as
KSTestObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# 3. The Statistical Test metadata tables must be loaded into the database where
# Analytics Library is installed.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
custanly = DataFrame("customer_analysis")
print(custanly)
# Example 1: A Kolmogorov-Smirnov test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="ks")
# Print the results.
print(obj.result)
# Example 2: A Lilliefors test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="l")
# Print the results.
print(obj.result)
# Example 3: A Shapiro-Wilk test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="sw")
# Print the results.
print(obj.result)
# Example 4: A D'Agostino and Pearson test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
group_columns="years_with_bank",
style="p")
# Print the results.
print(obj.result)
# Example 5: A Smirnov test with group-by option.
obj = valib.KSTest(data=custanly,
dependent_column="income",
columns="gender",
group_columns="years_with_bank",
style="s")
# Print the results.
print(obj.result)
|