Input
The example creates input tables t1 and t2 from a table of raw input, raw_normal_50_2, which contains data drawn from a normal distribution with a mean of 50 and a standard deviation of 2. Here are the first 10 rows:
price |
---|
48.0701 |
52.6426 |
48.6372 |
50.9832 |
50.523 |
52.1773 |
50.3103 |
48.4424 |
50.1352 |
50.1382 |
... |
The following statements create tables t1 and t2:
CREATE MULTISET TABLE t1 AS ( SELECT COUNT(*) AS group_size FROM raw_normal_50_2 WHERE price IS NOT NULL ) WITH data;
CREATE MULTISET TABLE t2 AS ( SELECT RANK() OVER (ORDER BY price) AS "rank", price FROM raw_normal_50_2 WHERE price IS NOT NULL ) WITH data;
SQL Call
SELECT * FROM DistributionMatchReduce ( ON DistributionMatchMultiInput ( ON t2 AS "input" PARTITION BY ANY ON t1 AS groupstats DIMENSION USING ValueColumn ('price') Tests ('KS', 'CvM', 'AD', 'CHISQ') Distributions ('NORMAL:49.97225, 2.009698') MinGroupSize (50) NumCell (10) ) PARTITION BY 1 ) AS dt;
Output
The reported p-value for each of the four tests is around 0.4, which does not rule out the null hypothesis that the data are consistent with a normal distribution with the specified mean and standard deviation.
In the output table column names, when 'a' and 'b' appear between digits, interpret them as comma (,) and period (.), respectively.
group_size | normal$49b97225a2b009698_ks_statistic | normal$49b97225a2b009698_ks_p_value | normal$49b97225a2b009698_cvm_statistic | normal$49b97225a2b009698_cvm_p_value | normal$49b97225a2b009698_ad_statistic | normal$49b97225a2b009698_ad_p_value | normal$49b97225a2b009698_chisq_statistic | normal$49b97225a2b009698_chisq_p_value |
---|---|---|---|---|---|---|---|---|
400 | 0.03195 | 0.411803 | 0.055654 | 0.430791 | 0.376152 | 0.410291 | 7.8 | 0.35056 |