Example: Using fraction_description
- Train, 60%
- Test, 25%
- Validate, 15%
Note that the sum does not exceed 100%.
A SELECT statement to generate the desired result looks like the following.
SELECT customer_id, age, income, marital_status, SAMPLEID FROM customer_table SAMPLE 0.6, 0.25, 0.15;
The result might look something like the following table.
customer_id ----------- |
age --- |
income ------ |
marital_status -------------- |
SAMPLEID -------- |
1362549 |
17 |
0 |
1 |
1 |
1362650 |
21 |
17,804 |
2 |
1 |
1362605 |
34 |
16,957 |
2 |
1 |
1362672 |
50 |
16,319 |
2 |
3 |
1362486 |
76 |
10,701 |
3 |
1 |
1362500 |
40 |
56,708 |
1 |
3 |
1362489 |
35 |
55,888 |
3 |
2 |
1362498 |
60 |
9,849 |
2 |
1 |
1362551 |
27 |
23,085 |
1 |
1 |
1362503 |
18 |
5,787 |
1 |
2 |
Sample 1 is the training group, Sample 2 is the test group, and Sample 3 is the validation group.
Example: Using count_description
Check if your customers are in at least 100 cities:
SELECT COUNT (DISTINCT city) FROM (SELECT city FROM customer_table SAMPLE 1000) TEMP;
If customer_table is large, the SAMPLE 1000 clause would not require a full scan of the table and the sort for DISTINCT would only handle 1,000 rows.
A 1,000 row sample is more than 95 percent accurate for estimating if the number of distinct values is greater than 100.
If you were to make a similar query without including the SAMPLE clause, the query would first have to sort the large customer_table before performing the DISTINCT. For example:
SELECT COUNT (DISTINCT city) FROM customer_table;
Examples Using Stratified Sampling
This table provides the data used in the examples that follow.
SELECT * FROM stores; storeid city state ----------- --------------- ----- 2 Green Bay WI 7 San Diego CA 5 San Jose CA 8 Los Angeles CA 3 Madison WI 1 Racine WI 6 San Francisco CA 4 Milwaukee WI
Example: Stratified Sampling and Proportional Allocation Without Replacement
The following query uses proportional allocation by default to sample, without replacement, 25 percent of the rows for WI and 50 percent of the rows for CA:
SELECT city, state, SAMPLEID FROM stores SAMPLE WHEN state = 'WI' THEN 0.25 WHEN state = 'CA' THEN 0.5 END ORDER BY 3; city state SAMPLEID --------------- ----- ----------- Milwaukee WI 1 San Diego CA 2 San Jose CA 2
Example: Stratified Sampling and Proportional Allocation With Replacement
The following query uses proportional allocation by default with replacement to sample two samples of 3 rows and 1 row, respectively, which are not mutually exclusive, for WI and two samples of 2 rows each for CA ,which are not mutually exclusive:
SELECT city, state, SAMPLEID FROM stores SAMPLE WITH REPLACEMENT WHEN state = 'WI' THEN 3, 1 WHEN state = 'CA' THEN 2, 2 END ORDER BY 3; city state SAMPLEID --------------- ----- ----------- Green Bay WI 1 Madison WI 1 Madison WI 1 Racine WI 2 San Diego CA 3 San Jose CA 3 San Diego CA 4 San Jose CA 4
Example: Stratified Sampling With Randomized Allocation Without Replacement
The following query uses randomized allocation without replacement to sample two mutually exclusive samples of 25 percent and 50 percent, respectively, of the rows from WI and two mutually exclusive samples of 25 percent each for CA:
SELECT city, state, SAMPLEID FROM stores SAMPLE RANDOMIZED ALLOCATION WHEN state = 'WI' THEN 0.25, 0.5 WHEN state = 'CA' THEN 0.25, 0.25 END ORDER BY 3; city state SAMPLEID --------------- ----- ----------- Green Bay WI 1 Milwaukee WI 2 Madison WI 2 San Diego CA 3 San Francisco CA 4
Example: Stratified Sampling With Randomized Allocation With Replacement
The following query uses randomized allocation with replacement to sample three rows for WI, and two nonspecific, non-WI rows:
SELECT city, state, SAMPLEID FROM stores SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION WHEN state = 'WI' THEN 3 ELSE 2 END ORDER BY 3; city state SAMPLEID --------------- ----- ----------- Racine WI 1 Racine WI 1 Madison WI 1 San Diego CA 2 San Diego CA 2
Example: Stratified Sampling With Randomized Allocation With Replacement
The following query samples three rows with replacement using randomized allocation:
SELECT city, state FROM stores SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 3; city state --------------- ----- San Diego CA Madison WI San Jose CA
Example: SAMPLE and the PERIOD Value Expression
The following example shows how you can specify a SAMPLE clause on a PERIOD value expression, where period_of_stay is the PERIOD value expression.
SELECT emp_no, period_of_stay, SAMPLEID FROM employee SAMPLE 0.5,0.5;