Examples - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Example 1: Using fraction_description

Suppose you want to generate three mutually exclusive sample sets of a customer table for a neural net analysis. The desired percentages are as follows:

  • Train, 60%
  • Test, 25%
  • Validate, 15%
  • Note that the sum does not exceed 100%.

    A SELECT statement to generate the desired result looks like the following.

         SELECT customer_id, age, income, marital_status, SAMPLEID
         FROM customer_table
         SAMPLE 0.6, 0.25, 0.15;

    The result might look something like the following table.

     

    Sample 1 is the training group, Sample 2 is the test group, and Sample 3 is the validation group.

    Example 2: Using count_description

    Check if your customers are in at least 100 cities:

         SELECT COUNT (DISTINCT city)
         FROM (SELECT city 
               FROM customer_table
               SAMPLE 1000) TEMP;

    If customer_table is large, the SAMPLE 1000 clause would not require a full scan of the table and the sort for DISTINCT would only handle 1,000 rows.

    A 1,000 row sample is more than 95 percent accurate for estimating if the number of distinct values is greater than 100.

    If you were to make a similar query without including the SAMPLE clause, the query would first have to sort the large customer_table before performing the DISTINCT. For example:

         SELECT COUNT (DISTINCT city)
         FROM customer_table;

    This table provides the data used in the examples that follow.

         SELECT *
         FROM stores;
     
           storeid   city             state
       -----------   ---------------   -----
                 2   Green Bay         WI
                 7   San Diego         CA
                 5   San Jose          CA
                 8   Los Angeles       CA
                 3   Madison           WI
                 1   Racine            WI
                 6   San Francisco     CA
                 4   Milwaukee         WI

    Example 3: Stratified Sampling and Proportional Allocation Without Replacement

    The following query uses proportional allocation by default to sample, without replacement, 25 percent of the rows for WI and 50 percent of the rows for CA:

         SELECT city, state, SAMPLEID
         FROM stores
         SAMPLE WHEN state = 'WI' THEN 0.25
                WHEN state = 'CA' THEN 0.5
                END
         ORDER BY 3;
     
       city             state    SAMPLEID
       ---------------  ----- -----------
       Milwaukee        WI              1
       San Diego        CA              2
       San Jose         CA              2

    Example 4: Stratified Sampling and Proportional Allocation With Replacement

    The following query uses proportional allocation by default with replacement to sample two samples of 3 rows and 1 row, respectively, which are not mutually exclusive, for WI and two samples of 2 rows each for CA, which are not mutually exclusive:

         SELECT city, state, SAMPLEID
         FROM stores
         SAMPLE WITH REPLACEMENT
             WHEN state = 'WI' THEN 3, 1
             WHEN state = 'CA' THEN 2, 2
             END
         ORDER BY 3;
     
       city             state    SAMPLEID
       ---------------  ----- -----------
       Green Bay        WI              1
       Madison          WI              1
       Madison          WI              1
       Racine           WI              2
       San Diego        CA              3
       San Jose         CA              3
       San Diego        CA              4
       San Jose         CA              4

    Example 5: Stratified Sampling With Randomized Allocation Without Replacement

    The following query uses randomized allocation without replacement to sample two mutually exclusive samples of 25 percent and 50 percent, respectively, of the rows from WI and two mutually exclusive samples of 25 percent each for CA:

         SELECT city, state, SAMPLEID
         FROM stores
         SAMPLE RANDOMIZED ALLOCATION
                WHEN state = 'WI' THEN 0.25, 0.5
                WHEN state = 'CA' THEN 0.25, 0.25
                END
         ORDER BY 3;
     
       city             state    SAMPLEID
       ---------------  ----- -----------
       Green Bay        WI              1
       Milwaukee        WI              2
       Madison          WI              2
       San Diego        CA              3
       San Francisco    CA              4

    Example 6: Stratified Sampling With Randomized Allocation With Replacement

    The following query uses randomized allocation with replacement to sample three rows for WI, and two nonspecific, non‑WI rows:

         SELECT city, state, SAMPLEID
         FROM stores
         SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION
            WHEN state = 'WI' THEN 3
            ELSE 2
            END
         ORDER BY 3;
     
       city             state    SAMPLEID
       ---------------  ----- -----------
       Racine           WI              1
       Racine           WI              1
       Madison          WI              1
       San Diego        CA              2
       San Diego        CA              2

    Example 7: Stratified Sampling With Randomized Allocation With Replacement

    The following query samples three rows with replacement using randomized allocation:

         SELECT city, state
         FROM stores
         SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 3;
     
       city             state
       ---------------  -----
       San Diego        CA
       Madison          WI 
       San Jose         CA

    Example 8: SAMPLE and the PERIOD Value Expression

    The following example shows how you can specify a SAMPLE clause on a PERIOD value expression, where period_of_stay is the PERIOD value expression.

         SELECT emp_no, period_of_stay, SAMPLEID 
         FROM employee 
         SAMPLE 0.5,0.5;