One Sample Z-Tests
Let x1, x2,....xn be a random sample drawn from a population with mean µ and variance σ2. Also, assume that the data follows a normal distribution Ɲ (µ, σ2).
H0; µ ≤ µ0
versus
H1; µ > µ0
or
H0: µ ≥ µ0
versus
H1: µ < µ0
H0: µ = µ0
versus
H1: µ ≠ µ0
The test statistic for testing the previous hypotheses is the Z-stat. The validity of the Z-stat is predicated on the assumption that the population variance σ2 is known.
The assumption of known variance is not practical because if the variance is known, then the mean µ is known. So, if the mean µ is known, the test is not required.
However, for large sample sizes (which is common in Big data applications), the sample variance s 2 is approximately equal to the unknown variance σ2. Therefore, a scenario that involves a large sample size validates the application of the Z-statistic.
The z-statistic is calculated as:
where the unknown standard deviation σ is replaced by the sample standard deviation as n → ∞ (sample size is very large). Therefore, the z-statistic is rewritten as:
Two Sample Z tests
The two sample z-test is used for testing equality of means of two populations. Let x1, x2,....xn1 ~ Ɲ (µ1, ) and y1, y2,....yn2 ~ Ɲ (µ2,
) be random samples from two independent populations. The Null hypothesis H0 and the alternative hypothesis H1 respectively for a one-sided lower-tailed test is given as:
H0; µ 1 ≥ µ2
versus
H1; µ1 < µ2
The Null hypothesis is rejected if Zstat < - ᴢ α where α ϵ {0.10, 0.05, 0.01}. Also, note that - ᴢ α is a percentile of the normal distribution with area to its left.
A one-sided upper-tailed test is calculated as:
H0; µ 1 ≤ µ2
versus
H1; µ1 > µ2
The Null hypothesis is rejected if Zstat > ᴢ α with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢα is a percentile of the normal distribution with (1- α) x 100 area to its left. So, - ᴢ α puts 100xα area to its left.
H0: µ 1 = µ2
versus
H1: µ1 ≠ µ2
The Null hypothesis is rejected if Zstat > ᴢ 1-α/2 or Zstat < -ᴢ α/2 with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢ 1-α/2 is a percentile of the normal distribution with (1- α/2) x 100 area to its left. So, - ᴢ α puts 100xα area to its left. Note Zstat ~ Ɲ (0,1).