SAMPLE Clause Examples | SQL SELECT Statements | Teradata Vantage - Examples: Sample Clause - Advanced SQL Engine - Teradata Database

SQL Data Manipulation Language

Product
Advanced SQL Engine
Teradata Database
Release Number
17.00
Published
September 2020
Language
English (United States)
Last Update
2021-01-23
dita:mapPath
qtb1554762060450.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata Vantage™

Example: Using fraction_description

Suppose you want to generate three mutually exclusive sample sets of a customer table for a neural net analysis. The desired percentages are as follows:
  • Train, 60%
  • Test, 25%
  • Validate, 15%

Note that the sum does not exceed 100%.

A SELECT statement to generate the desired result looks like the following.

     SELECT customer_id, age, income, marital_status, SAMPLEID
     FROM customer_table
     SAMPLE 0.6, 0.25, 0.15;

The result might look something like the following table.

customer_id
-----------
age
---
income
------
marital_status
--------------
SAMPLEID
--------
1362549
17
      0
1
1
1362650
21
 17,804
2
1
1362605
34
 16,957
2
1
1362672
50
 16,319
2
3
1362486
76
 10,701
3
1
1362500
40
 56,708
1
3
1362489
35
 55,888
3
2
1362498
60
  9,849
2
1
1362551
27
 23,085
1
1
1362503
18
  5,787
1
2

Sample 1 is the training group, Sample 2 is the test group, and Sample 3 is the validation group.

Example: Using count_description

Check if your customers are in at least 100 cities:

     SELECT COUNT (DISTINCT city)
     FROM (SELECT city
           FROM customer_table
           SAMPLE 1000) TEMP;

If customer_table is large, the SAMPLE 1000 clause would not require a full scan of the table and the sort for DISTINCT would only handle 1,000 rows.

A 1,000 row sample is more than 95 percent accurate for estimating if the number of distinct values is greater than 100.

If you were to make a similar query without including the SAMPLE clause, the query would first have to sort the large customer_table before performing the DISTINCT. For example:

     SELECT COUNT (DISTINCT city)
     FROM customer_table;

Examples Using Stratified Sampling

This table provides the data used in the examples that follow.

     SELECT *
     FROM stores;
       storeid   city             state
   -----------   ---------------   -----
             2   Green Bay         WI
             7   San Diego         CA
             5   San Jose          CA
             8   Los Angeles       CA
             3   Madison           WI
             1   Racine            WI
             6   San Francisco     CA
             4   Milwaukee         WI

Example: Stratified Sampling and Proportional Allocation Without Replacement

The following query uses proportional allocation by default to sample, without replacement, 25 percent of the rows for WI and 50 percent of the rows for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WHEN state = 'WI' THEN 0.25
            WHEN state = 'CA' THEN 0.5
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Milwaukee        WI              1
   San Diego        CA              2
   San Jose         CA              2

Example: Stratified Sampling and Proportional Allocation With Replacement

The following query uses proportional allocation by default with replacement to sample two samples of 3 rows and 1 row, respectively, which are not mutually exclusive, for WI and two samples of 2 rows each for CA ,which are not mutually exclusive:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT
         WHEN state = 'WI' THEN 3, 1
         WHEN state = 'CA' THEN 2, 2
         END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Madison          WI              1
   Madison          WI              1
   Racine           WI              2
   San Diego        CA              3
   San Jose         CA              3
   San Diego        CA              4
   San Jose         CA              4

Example: Stratified Sampling With Randomized Allocation Without Replacement

The following query uses randomized allocation without replacement to sample two mutually exclusive samples of 25 percent and 50 percent, respectively, of the rows from WI and two mutually exclusive samples of 25 percent each for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE RANDOMIZED ALLOCATION
            WHEN state = 'WI' THEN 0.25, 0.5
            WHEN state = 'CA' THEN 0.25, 0.25
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Milwaukee        WI              2
   Madison          WI              2
   San Diego        CA              3
   San Francisco    CA              4

Example: Stratified Sampling With Randomized Allocation With Replacement

The following query uses randomized allocation with replacement to sample three rows for WI, and two nonspecific, non-WI rows:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION
        WHEN state = 'WI' THEN 3
        ELSE 2
        END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Racine           WI              1
   Racine           WI              1
   Madison          WI              1
   San Diego        CA              2
   San Diego        CA              2

Example: Stratified Sampling With Randomized Allocation With Replacement

The following query samples three rows with replacement using randomized allocation:

     SELECT city, state
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 3;
   city             state
   ---------------  -----
   San Diego        CA
   Madison          WI
   San Jose         CA

Example: SAMPLE and the PERIOD Value Expression

The following example shows how you can specify a SAMPLE clause on a PERIOD value expression, where period_of_stay is the PERIOD value expression.

     SELECT emp_no, period_of_stay, SAMPLEID
     FROM employee
     SAMPLE 0.5,0.5;