TD_ZTest Function | Hypothesis Testing | Teradata Vantage - TD_ZTest - Teradata® Database

Database Analytic Functions

Product
Teradata® Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/wnd1589838592459.ditamap
dita:ditavalPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/ayr1485454803741.ditaval
dita:id
B035-1206
lifecycle
previous
Product Category
Teradata Vantage™

TD_ZTest performs a Z-test, for which the distribution of the test statistic under the Null hypothesis can be approximated by normal distribution.

TD_ZTest tests the equality of two means under the assumption that the population variances are known (rarely true). For large samples, sample variances approximate population variances, so TD_ZTest uses sample variances instead of population variances in the test statistic.

Assumptions

  • Sample distribution is normal.
  • Data is numeric, not categorical.

Test Type

  • One-tailed or two-tailed (your choice)
  • One-sample or two-sample (your choice)

    Use one-sample to test whether the mean of a population is greater than, less than, or not equal to a specific value. TD_ZTest finds the answer by comparing the critical values of the normal distribution at levels of significance (alpha = 0.01, 0.05, 0.10) to the Z-test statistic.

  • Unpaired

Computational Method

A test of the hypothesis (ToH) involves the following framework:
  • A Null hypothesis H0 and an alternative hypothesis H1
  • A random sample x1, x2,....xn in the case of a one sample test
  • Two random samples x1, x2,....xn and y1, y2,....yn in the case of a two sample test
  • A test statistic Zstat
  • A level of significance α ϵ {0.10, 0.05, 0.01}
  • Compare the sample based Zstat with the percentage point of the normal distribution |ᴢ| or |ᴢ α/2|
  • Compute the p-value
  • Conclusion

One Sample Z-Tests

Let x1, x2,....xn be a random sample drawn from a population with mean µ and variance σ2. Also, assume that the data follows a normal distribution Ɲ (µ, σ2).

H0; µ ≤ µ0

versus

H1; µ > µ0

or

H0: µ ≥ µ0

versus

H1: µ < µ0

H0: µ = µ0

versus

H1: µ ≠ µ0

The test statistic for testing the above hypotheses is the Z-stat. The validity of the Z-stat is predicated on the assumption that the population variance σ2 is known.

The assumption of known variance is not practical because if the variance is known, then the mean µ is known. So, if the mean µ is known, the test is not required.

However, for large sample sizes (which is common in Big data applications), the sample variance s 2 is approximately equal to the unknown variance σ2. Therefore, a scenario that involves a large sample size validates the application of the Z-statistic.

The z-statistic is calculated as:


z_stats_calc

where the unknown standard deviation σ is replaced by the sample standard deviation sample_std_deviation as n → ∞ (sample size is very large). Therefore, the z-statistic is rewritten as:


rewritten_z_stats
where
x_bar_calc
In case I of the upper tailed hypothesis test, the Null hypothesis is rejected if Zstat > ᴢ α where α ϵ {0.10, 0.05, 0.01}. In case II of the lower tailed hypothesis test, the Null hypothesis is rejected if Zstat < ᴢ α where α ϵ {0.10, 0.05, 0.01}. In case III of the two-tailed test, the Null hypothesis is rejected if Zstat > ᴢ α/2 and Zstat < ᴢ α/2, α ϵ {0.10, 0.05, 0.01}.

Two Sample Z tests

The two sample z-test is used for testing equality of means of two populations. Let x1, x2,....xn1 ~ Ɲ (µ1, sigma1_sq) and y1, y2,....yn2 ~ Ɲ (µ2, sigmay_sq) be random samples from two independent populations. The Null hypothesis H0 and the alternative hypothesis H1 respectively for a one-sided lower-tailed test is given as:

H0; µ 1 ≥ µ2

versus

H1; µ1 < µ2


zstat_formula_updated

The Null hypothesis is rejected if Zstat < - ᴢ α where α ϵ {0.10, 0.05, 0.01}. Also, note that - ᴢ α is a percentile of the normal distribution with area to its left.

A one-sided upper-tailed test is calculated as:

H0; µ 1 ≤ µ2

versus

H1; µ1 > µ2

zstat_formula_updated

The Null hypothesis is rejected if Zstat > ᴢ α with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢα is a percentile of the normal distribution with (1- α) x 100 area to its left. So, - ᴢ α puts 100xα area to its left.

H0: µ 1 = µ2

versus

H1: µ1 ≠ µ2

zstat_formula_updated

The Null hypothesis is rejected if Zstat > ᴢ 1-α/2 or Zstat < -ᴢ α/2 with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢ 1-α/2 is a percentile of the normal distribution with (1- α/2) x 100 area to its left. So, - ᴢ α puts 100xα area to its left. Note Zstat ~ Ɲ (0,1).