SAMPLE Clause Examples | SQL SELECT Statements | VantageCloud Lake - Examples: SAMPLE Clause - Teradata VantageCloud Lake

Lake - Working with SQL

Deployment
VantageCloud
Edition
Lake
Product
Teradata VantageCloud Lake
Release Number
Published
February 2025
ft:locale
en-US
ft:lastEdition
2025-11-21
dita:mapPath
jbe1714339405530.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
jbe1714339405530

Example: Using fraction_description

Suppose you want to generate three mutually exclusive sample sets of a customer table for a neural net analysis. The desired percentages are as follows:
  • Train, 60%
  • Test, 25%
  • Validate, 15%

The sum does not exceed 100%.

A SELECT statement to generate the desired result looks like the following.

     SELECT customer_id, age, income, marital_status, SAMPLEID
     FROM customer_table
     SAMPLE 0.6, 0.25, 0.15;

The result looks similar to the following table.

customer_id age income marital_status SAMPLEID
1362549 17       0 1 1
1362650 21  17,804 2 1
1362605 34  16,957 2 1
1362672 50  16,319 2 3
1362486 76  10,701 3 1
1362500 40  56,708 1 3
1362489 35  55,888 3 2
1362498 60   9,849 2 1
1362551 27  23,085 1 1
1362503 18   5,787 1 2

Sample 1 is the training group, Sample 2 is the test group, and Sample 3 is the validation group.

Example: Using count_description

Check if your customers are in at least 100 cities:

     SELECT COUNT (DISTINCT city)
     FROM (SELECT city
           FROM customer_table
           SAMPLE 1000) TEMP;

If customer_table is large, the SAMPLE 1000 clause does not require a full scan of the table and the sort for DISTINCT handles only 1,000 rows.

A 1,000 row sample is more than 95 percent accurate for estimating if the number of distinct values is greater than 100.

A similar query without the SAMPLE clause must sort the large customer_table before performing the DISTINCT. For example:

     SELECT COUNT (DISTINCT city)
     FROM customer_table;

Examples Using Stratified Sampling

This table provides the data used in the examples that follow.

     SELECT *
     FROM stores;
       storeid   city             state
   -----------   ---------------   -----
             2   Green Bay         WI
             7   San Diego         CA
             5   San Jose          CA
             8   Los Angeles       CA
             3   Madison           WI
             1   Racine            WI
             6   San Francisco     CA
             4   Milwaukee         WI

Example: Stratified Sampling and Proportional Allocation without Replacement

The following query uses proportional allocation by default to sample, without replacement, 25 percent of the rows for WI and 50 percent of the rows for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WHEN state = 'WI' THEN 0.25
            WHEN state = 'CA' THEN 0.5
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Milwaukee        WI              1
   San Diego        CA              2
   San Jose         CA              2

Example: Stratified Sampling and Proportional Allocation with Replacement

The following query uses proportional allocation by default with replacement to sample two samples of 3 rows and 1 row, respectively, which are not mutually exclusive, for WI and two samples of 2 rows each for CA,which are not mutually exclusive:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT
         WHEN state = 'WI' THEN 3, 1
         WHEN state = 'CA' THEN 2, 2
         END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Madison          WI              1
   Madison          WI              1
   Racine           WI              2
   San Diego        CA              3
   San Jose         CA              3
   San Diego        CA              4
   San Jose         CA              4

Example: Stratified Sampling with Randomized Allocation without Replacement

The following query uses randomized allocation without replacement to sample two mutually exclusive samples of 25 percent and 50 percent, respectively, of the rows from WI and two mutually exclusive samples of 25 percent each for CA:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE RANDOMIZED ALLOCATION
            WHEN state = 'WI' THEN 0.25, 0.5
            WHEN state = 'CA' THEN 0.25, 0.25
            END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Green Bay        WI              1
   Milwaukee        WI              2
   Madison          WI              2
   San Diego        CA              3
   San Francisco    CA              4

Example: Stratified Sampling with Randomized Allocation with Replacement

The following query uses randomized allocation with replacement to sample three rows for WI, and two nonspecific, non-WI rows:

     SELECT city, state, SAMPLEID
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION
        WHEN state = 'WI' THEN 3
        ELSE 2
        END
     ORDER BY 3;
   city             state    SAMPLEID
   ---------------  ----- -----------
   Racine           WI              1
   Racine           WI              1
   Madison          WI              1
   San Diego        CA              2
   San Diego        CA              2

Example: Stratified Sampling with Randomized Allocation with Replacement

The following query samples three rows with replacement using randomized allocation:

     SELECT city, state
     FROM stores
     SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 3;
   city             state
   ---------------  -----
   San Diego        CA
   Madison          WI
   San Jose         CA

Example: SAMPLE and the PERIOD Value Expression

The following example shows how you can specify a SAMPLE clause on a PERIOD value expression, where period_of_stay is the PERIOD value expression.

     SELECT emp_no, period_of_stay, SAMPLEID
     FROM employee
     SAMPLE 0.5,0.5;