Description
Parametric tests make assumptions about the data, such as the observations
being normally distributed. This can be verified with a test of normality
prior to executing a parametric test. Both T-Tests and F-Tests are provided.
T-Tests can be either paired or unpaired, while the unpaired T-Tests can be
with or without an indicator variable.
F-Tests can be 1-way, 2-way or 3-way. 2-way tests can have equal or unequal
cell counts (count of rows having a combination of distinct column values),
while the 3-way test must have equal cell counts. A 1-way test has 1
independent input column, a 2-way test has 2 independent columns and a 3-way
test has 3 independent columns in addition to a dependent "column of
interest".
Detailed information about each test can be found in
'Statistical Tests offered' section.
Usage
td_parametric_test_valib(data, ...)
Arguments
data |
Required Argument. |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_parametric_test_valib"
which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using name: result.
Other Arguments
columns
Optional Argument.
Specifies the name(s) of the column(s) representing
independent variables to be analyzed in a F-Test N-Way
with Equal Cell Counts analysis. There can be 1, 2 or 3
columns listed in this parameter.
If 2 or 3 columns, cell counts (the count of rows having
a combination of distinct column values) should be the same.
Types: character OR vector of Strings (character)
dependent.column
Optional Argument.
Specifies the name of the column representing
the dependent variable in an F-Test.
Types: character
equal.variance
Optional Argument.
Required when the argument "near.dep.report" is
set to TRUE.
Specifies the condition index threshold parameter
to generate Near Dependency Report.
Default Value: 30
Types: numeric
fallback
Optional Argument.
Specifies whether the FALLBACK is requested as in the
output result or not.
Default Value: FALSE
Types: logical
first.column
Optional Argument.
Specifies the name of the column representing the
first variable to analyze for a T-test. For an F-Test,
specifies the name of the column representing the
first independent variable in the analysis.
Types: character
first.column.values
Optional Argument.
Specifies a list of the "first.column" values
to be included in the analysis.
Types: Integer, Numeric, character OR vector
of Integers, Numerics, Strings (character)
group.columns
Optional Argument.
Specifies the name(s) of the column(s) for grouping
so that a separate result is produced for each value
or combination of values in the specified column or columns.
Note:
This option is not available for an F 2-way analysis.
Types: character OR vector of Strings (character)
allow.duplicates
Optional Argument.
Specifies whether duplicates are allowed
in the output or not.
Default Value: FALSE
Types: logical
paired
Optional Argument.
Specifies whether the first and second column values are
matched with each other. When set to TRUE, the mean
difference is also analyzed.
Default Value: FALSE
Note:
This is an option for T-Test.
Types: logical
second.column
Optional Argument.
Specifies the name of the column representing the
second variable to analyze. If the "with.indicator"
argument is set to TRUE, the second column is used
to define two analysis categories, one where the
second column is negative or zero, and another where
the second column is positive.
For an F-Test, specifies the name of the column
representing the second independent variable in the
analysis.
Note:
Date Type is not allowed to be used for the paired T-Test.
Types: character
second.column.values
Optional Argument. Required for a 2-way
F-Test with Unequal Cell Counts.
Specifies a list of the "second.column"
values to be included
in the analysis.
Types: Integer, Numeric, character OR vector
of Integers, Numerics, Strings (character)
stats.database
Optional Argument.
Specifies the database where the statistical test
metadata tables are installed. If not specified,
the source database is searched for these
metadata tables.
Types: character
style
Optional Argument.
Specifies the test style.
Permitted Values:
't' - T-Test paired, unpaired or unpaired with indicator variable (second column).
'fnway' - F-Test N-Way with Equal Cell Counts (1, 2, or 3 columns with same number of cell counts). A cell count is the count of rows having a combination of distinct column values.
'f2way' - F-Test 2-Way with Unequal Cell Counts (2 columns with possibly different numbers of cell counts). A cell count is the count of rows having a combination of distinct column values.
Default Value: 't'
Types: character
probability.threshold
Optional Argument.
Specifies the threshold probability, i.e.,
'alpha' probability, below which the null
hypothesis is rejected.
Default Value: 0.05
Types: numeric
with.indicator
Optional Argument.
Specifies whether the second column is used to
indicate there are two analysis categories: one for
the case where the second column is negative or
zero, and another when the second column is
positive. When this is set to TRUE, then second
column is used to indicate the analysis
categories.
Notes:
Argument can be used with an un-paired T-Test, i.e., when "style" is set to 't' and paired is set to FALSE.
Default Value: FALSE
Types: logical
Statistical Tests offered
Two Sample T-Test for Equal Means
For the paired t test, a one-to-one correspondence must exist between values
in both samples. The test is whether paired values have mean differences
which are not significantly different from zero. It assumes differences are
identically distributed normal random variables, and that they are
independent.
The unpaired t test is similar, but there is no correspondence between values
of the samples. It assumes that, within each sample, values are identically
distributed normal random variables, and that the two samples are independent
of each other. The two sample sizes may be equal or unequal. Variances of
both samples may be assumed to be equal (homoscedastic) or unequal
(heteroscedastic). In both cases, the null hypothesis is that the population
means are equal. Test output is a p-value which compared to the threshold
determines whether the null hypothesis should be rejected.
The unpaired t test uses the following methods of data selection:
T Unpaired selects the columns with the two unpaired datasets, some of which may be NULL.
T Unpaired with Indicator selects the column of interest and a second indicator column which determines to which group the first variable belongs.
If the indicator variable is negative or zero, it will be assigned to the
first group; if it is positive, it will be assigned to the second group.
The two sample t tests for unpaired data are defined as shown below:
H0: mu1 = mu2
H1: mu1 != mu2
Test Statistic: T = (Y1 - Y2) / sqrt(s1/N1 + s2/N2)
where N1 and N2 are the sample sizes, Y1 and Y2 are the sample means, and s1 and s2 are sample variances.
F-Test - N-Way
-
F-Test/Analysis of Variance - One Way, Equal or Unequal Sample Size.
-
F-Test/Analysis of Variance - Two Way, Equal Sample Size.
-
F-Test/Analysis of Variance - Three Way, Equal Sample Size.
Use the ANOVA or F-test to determine if significant differences exist among treatment means or interactions. This preliminary test indicates if further analysis of the relationship among treatment means is warranted. If the null hypothesis of no difference among treatments is accepted, the test result implies factor levels and response are unrelated, so the analysis is terminated. When the null hypothesis is rejected, the analysis is usually continued to examine the nature of the factor-level effects. Examples are:
Tukey's Method - Tests all possible pairwise differences of means.
Scheffe's Method -Tests all possible contrasts at the same time.
Bonferroni's Method - Tests, or puts simultaneous confidence intervals around a preselected group of contrasts.
Use the N-way F-Test to execute within groups defined by the distinct values
of the group-by variables (GBVs), the same as most of the other nonparametric
tests. Two or more treatments must exist in the data within the groups
defined by the distinct GBV values.
Given a column of interest (dependent variable), one or more input columns
(independent variables) and optionally one or more group-by columns (all from
the same input), an F-Test is produced. The N-Way ANOVA tests whether a
set of sample means are all equal (the null hypothesis). Output is a p-value
which when compared to the user's threshold, determines whether the null
hypothesis should be rejected.
F-Test/Analysis of Variance - 2-Way Unequal Sample Size
Use the 2-way Unequal Sample Size F-Test to execute on the entire dataset.
No group-by parameter is provided for this test, but if such a test is
desired, multiple tests must be run on pre-prepared datasets with group-by
variables in each as different constants. Two or more treatments must exist
in the data within the dataset.
Note:
This test creates a temporary work table in the Result Database and drops it at the end of processing, even if the Output option to Store the tabular output of this analysis in the database is not selected.
Given the input of tabulated values, an F-Test is produced. The N-Way
ANOVA tests whether a set of sample means are all equal (the null
hypothesis). Output is a p-value which when compared to the user's threshold,
determines whether the null hypothesis should be rejected.
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set option
# 'val.install.location' to the database name where Vantage analytic
# library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic
# Library installer.
# 3. The Statistical Test metadata tables must be loaded into the database
# where Analytics Library is installed.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create required objects of class "tbl_teradata".
customer <- tbl(con, "customer")
print(df)
cust <- tbl(con, "customer_analysis")
print(cust)
# Example 1: Perform T-Test with default values.
obj <- td_parametric_test_valib(data=cust,
first.column="avg_cc_bal",
second.column="avg_sv_bal",
paired=TRUE,
equal.variance=TRUE,
group.columns=c("age", "gender"))
# Print the results.
print(obj$result)
# Example 2: Perform One way F-Test.
obj <- td_parametric_test_valib(data=customer,
style="fnway",
dependent.column="income",
columns="gender",
probability.threshold=0.01,
group.columns=c("years_with_bank",
"nbr_children"))
# Print the results.
print(obj$result)
# Example 3: Perform a 2-way F-Test with Unequal Cell Counts.
obj <- td_parametric_test_valib(data=customer,
style="f2way",
dependent.column="income",
first.column="years_with_bank",
first.column.values=c(0, 1, 2, 3, 4, 5, 6, 7),
second.column="marital_status",
second.column.values=c(1, 2, 3, 4),
probability.threshold=0.01)
# Print the results.
print(obj$result)