Kolmogorov-Smirnov Tests | Vantage Analytics Library - Kolmogorov-Smirnov Tests

Kolmogorov-Smirnov tests use maximum vertical distance between functions as a measure of function similarity. They map two empirical distribution functions against each other or a single empirical function against a hypothetical distribution (for example, a normal distribution) and determine the likelihood that the two distributions are the same.

Kolmogorov-Smirnov Test (One Sample)

The Kolmogorov-Smirnov test determines if a dataset matches has the normal distribution.

This test assumes nothing about the data distribution (that is, the test is nonparametric and distribution-free). Less general tests (for example, the Student's t-test) may be more sensitive if the data meet the test requirements.

This test is usually less powerful than tests specifically designed to test for normality, especially when the mean and variance are not specified in advance.

This test does not indicate the type of nonnormality—for example, whether the distribution is skewed, heavy-tailed, or both. Examining the skewness and kurtosis, and the histogram, boxplot, and normal probability plot for the data may show why the data failed the Kolmogorov-Smirnov test.

Each unique set of values in the groupby columns is called a group-by value set, or GBV set. The function does a separate test for each GBV set.

Lilliefors Test

The Lilliefors test determines whether a dataset matches a particular distribution. This test is a modification of the Kolmogorov-Smirnov test in that it converts data to Z-scores.

This test computes the Lilliefors statistic, checks its significance, computes exact tables of the quantiles of the test statistic from random numbers in computer simulations, and compares the computed value of the test statistic with the quantiles of the statistic.

When this test is for the normal distribution, the null hypothesis is that the distribution function is normal with unspecified mean and variance. The alternative hypothesis is that the distribution function is nonnormal. The test compares the empirical distribution of X with a normal distribution with the same mean and variance as X. It is similar to the Kolmogorov-Smirnov test, but it adjusts for the fact that the parameters of the normal distribution are estimated from X rather than specified in advance.

The function does a separate test for each GBV set.

Shapiro-Wilk Test

The Shapiro-Wilk test detects departures from the normal distribution without requiring advance specification of the mean or variance of the hypothesized normal distribution. It is considered one of the best omnibus tests of normality, and is usually more powerful than the Kolmogorov-Smirnov test.

The standard algorithm for the Shapiro-Wilk test applies only to sample sizes from 3 to 2000. The test statistic is based on the Kolmogorov-Smirnov statistic for a normal distribution with the same mean and variance as the sample mean and variance.

The Shapiro-Wilk test performed by the kstest function in the Vantage Analytics Library is based on the approximations and code given by Royston (1982a, b). It too applies only to sample sizes from 3 to 2000. Royston (1982b) gives approximations and tabled values that you can use to compute the coefficients and computes the significance level of the W statistic. Small values of W are evidence of departure from normality. This test has done very well in comparison studies with other goodness- of-fit tests.

D'Agostino and Pearson Test

The D'Agostino and Pearson test detects departures from the normal distribution without requiring advance specification of the mean or variance of the hypothesized normal distribution. It is an omnibus tests of normality, and is usually more powerful than the Kolmogorov-Smirnov test.

The D'Agostino-Pearson K squared statistic has approximately a chi-squared distribution with two degrees of freedom when the population is normally distributed.

Smirnov Test

The Smirnov test (also called the two-sample Kolmogorov-Smirnov test) checks whether two datasets have a significantly different distribution.

If the number of observations in the first distribution times the number of observations in the second distribution is greater than 10000, then an approximate measure of the p-value is made. Otherwise, an exact measure is made.

Kolmogorov-Smirnov Tests | Vantage Analytics Library - Kolmogorov-Smirnov Tests - Vantage Analytics Library

Vantage Analytics Library User Guide

Kolmogorov-Smirnov Test (One Sample)

Lilliefors Test

Shapiro-Wilk Test

D'Agostino and Pearson Test

Smirnov Test