This example shows the use of grouping columns, and also illustrates the syntax for testing against multiple distributions in a single SQL command.
Input
The input table, factory_7, represents hypothetical mean-time-to-failure data for two products. This is a subset of the rows:
product | mttf |
---|---|
A | 10039.5 |
A | 9926.6 |
A | 9971.34 |
A | 9868.7 |
A | 9940.17 |
A | 10266.7 |
A | 9768.64 |
A | 10043.2 |
A | 10133.7 |
A | 9731.33 |
... | ... |
D | 9721.21 |
D | 10068.6 |
D | 9952 |
D | 9851.94 |
D | 10378.3 |
D | 9908.9 |
D | 9749.43 |
D | 10448 |
D | 9681.25 |
D | 10147.5 |
... | ... |
SQL Call
The function call evaluates two possible distributions (normal and uniform) and applies the Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) fit tests.
SELECT * FROM DistributionMatchReduce ( ON DistributionMatchMultiInput ( ON ( SELECT RANK() OVER (PARTITION BY product ORDER BY mttf) AS "rank", product, mttf FROM factory_7 WHERE mttf IS NOT NULL ) AS "input" PARTITION BY ANY ON ( SELECT product, COUNT(*) AS group_size FROM factory_7 WHERE mttf IS NOT NULL GROUP BY product ) AS groupstats DIMENSION USING ValueColumn ('mttf') Tests ('KS', 'AD') Distributions ('NORMAL:10000,150','UNIFORMCONTINUOUS:9500,10500') GroupByColumns ('product') MinGroupSize (50) ) PARTITION BY product ) AS dt;
Output
The reported p-values support these conclusions:
- For product A:
- Both tests fail to reject the null hypothesis that the data fit a normal distribution with the specified parameters.
- Both tests reject the null hypothesis that the data fit the specified uniform distribution.
- For product D:
- Both tests fail to reject the null hypothesis that the data fit a uniform distribution with the specified parameters.
- Both tests reject the null hypothesis that the data fit the specified normal distribution.
In the output table column names, when 'a' and 'b' appear between digits, interpret them as comma (,) and period (.), respectively.
product | group_size | normal$10000a150_ks_statistic | normal$10000a150_ks_p_value | normal$10000a150_ad_statistic | normal$10000a150_ad_p_value | normal$10000a150_chisq_statistic | normal$10000a150_chisq_p_value | uniformcontinuous$9500a10500_ks_statistic | uniformcontinuous$9500a10500_ks_p_value | uniformcontinuous$9500a10500_ad_statistic | uniformcontinuous$9500a10500_ad_p_value | uniformcontinuous$9500a10500_chisq_statistic | uniformcontinuous$9500a10500_chisq_p_value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D | 3000 | 0.207718 | 0 | 886.383 | 0 | 0.00774023 | 0.993823 | 0.442443 | 0.805828 | 0.442443 | 0.805828 | 4.72667 | 0.857455 |
A | 3000 | 0.0113786 | 0.454754 | 0.530142 | 0.175628 | 0.214864 | 2.22045e-16 | 365.583 | 2e-07 | 365.583 | 2e-07 | 2613.86 | 0 |