SAMPLE Clause Examples | SQL SELECT Statements | Teradata Vantage - Examples: Sample Clause - Analytics Database - Teradata Vantage

SQL Data Manipulation Language

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-05
dita:mapPath
pon1628111750298.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
esx1472246586715
lifecycle
latest
Product Category
Teradata Vantage™

Example: Using fraction_description

Suppose you want to generate three mutually exclusive sample sets of a customer table for a neural net analysis. The desired percentages are as follows:
  • Train, 60%
  • Test, 25%
  • Validate, 15%

Note that the sum does not exceed 100%.

A SELECT statement to generate the desired result looks like the following.

     SELECT customer_id, age, income, marital_status, SAMPLEID
     FROM customer_table
     SAMPLE 0.6, 0.25, 0.15;

The result might look something like the following table.

customer_id ----------- age --- income ------ marital_status -------------- SAMPLEID --------
1362549 17      0 1 1
1362650 21 17,804 2 1
1362605 34 16,957 2 1
1362672 50 16,319 2 3
1362486 76 10,701 3 1
1362500 40 56,708 1 3
1362489 35 55,888 3 2
1362498 60 9,849 2 1
1362551 27 23,085 1 1
1362503 18   5,787 1 2

Sample 1 is the training group, Sample 2 is the test group, and Sample 3 is the validation group.

Example: Using count_description

Check if your customers are in at least 100 cities:

     SELECT COUNT (DISTINCT city)
     FROM (SELECT city
           FROM customer_table
           SAMPLE 1000) TEMP;

If customer_table is large, the SAMPLE 1000 clause would not require a full scan of the table and the sort for DISTINCT would only handle 1,000 rows.

A 1,000 row sample is more than 95 percent accurate for estimating if the number of distinct values is greater than 100.

If you were to make a similar query without including the SAMPLE clause, the query would first have to sort the large customer_table before performing the DISTINCT. For example:

     SELECT COUNT (DISTINCT city)
     FROM customer_table;

Examples Using Stratified Sampling

This table provides the data used in the examples that follow.

     SELECT *
     FROM stores;
       storeid   city             state
   -----------   ---------------   -----
             2   Green Bay         WI
             7   San Diego         CA
             5   San Jose          CA
             8   Los Angeles       CA
             3   Madison           WI
             1   Racine            WI
             6   San Francisco     CA
             4   Milwaukee         WI

Example: Stratified Sampling and Proportional Allocation Without Replacement

The following query uses proportional allocation by default to sample, without replacement, 25 percent of the rows for WI and 50 percent of the rows for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WHEN state = 'WI' THEN 0.25
            WHEN state = 'CA' THEN 0.5
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Milwaukee        WI              1
   San Diego        CA              2
   San Jose         CA              2

Example: Stratified Sampling and Proportional Allocation With Replacement

The following query uses proportional allocation by default with replacement to sample two samples of 3 rows and 1 row, respectively, which are not mutually exclusive, for WI and two samples of 2 rows each for CA ,which are not mutually exclusive:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT
         WHEN state = 'WI' THEN 3, 1
         WHEN state = 'CA' THEN 2, 2
         END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Madison          WI              1
   Madison          WI              1
   Racine           WI              2
   San Diego        CA              3
   San Jose         CA              3
   San Diego        CA              4
   San Jose         CA              4

Example: Stratified Sampling With Randomized Allocation Without Replacement

The following query uses randomized allocation without replacement to sample two mutually exclusive samples of 25 percent and 50 percent, respectively, of the rows from WI and two mutually exclusive samples of 25 percent each for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE RANDOMIZED ALLOCATION
            WHEN state = 'WI' THEN 0.25, 0.5
            WHEN state = 'CA' THEN 0.25, 0.25
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Milwaukee        WI              2
   Madison          WI              2
   San Diego        CA              3
   San Francisco    CA              4

Example: Stratified Sampling With Randomized Allocation With Replacement

The following query uses randomized allocation with replacement to sample three rows for WI, and two nonspecific, non-WI rows:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION
        WHEN state = 'WI' THEN 3
        ELSE 2
        END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Racine           WI              1
   Racine           WI              1
   Madison          WI              1
   San Diego        CA              2
   San Diego        CA              2

Example: Stratified Sampling With Randomized Allocation With Replacement

The following query samples three rows with replacement using randomized allocation:

     SELECT city, state
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 3;
   city             state
   ---------------  -----
   San Diego        CA
   Madison          WI
   San Jose         CA

Example: SAMPLE and the PERIOD Value Expression

The following example shows how you can specify a SAMPLE clause on a PERIOD value expression, where period_of_stay is the PERIOD value expression.

     SELECT emp_no, period_of_stay, SAMPLEID
     FROM employee
     SAMPLE 0.5,0.5;