SAMPLEID Expression - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
16.10
Published
June 2017
Language
English (United States)
Last Update
2018-04-25
dita:mapPath
psg1480972718197.ditamap
dita:ditavalPath
changebar_rev_16_10_exclude_audience_ie.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Purpose

Identifies the sample to which a row belongs, distinguishing rows belonging to different samples specified in the SAMPLE clause of a SELECT statement.

Syntax



Syntax Elements

SAMPLEID
Select list or ORDER BY clause expression that indicates that sample identifiers are to be used to link result set rows with their originating samples.

ANSI Compliance

SAMPLEID is a Teradata extension to the ANSI SQL:2011 standard.

Definition of a Sample ID

The sample ID identifies the sample to which a row belongs in the left-to-right order of the SAMPLE clause specification, from 1 through n (where n is the number of samples requested in the SAMPLE clause).

Rules and Restrictions for SAMPLEID

The rules and restrictions are:

  • You can only specify SAMPLEID with a SAMPLE clause, which can appear either as part of a select list or as an ORDER BY clause expression.
  • SAMPLEID cannot be the only term specified in the select list. If you specify SAMPLEID, you must also specify at least one other non-SAMPLEID column expression.

Using SAMPLEID With Stratified Sampling

The SAMPLEID value for stratified sampling is simply 1, 2, 3, … n across n specified samples regardless of stratification. That is, for the following SAMPLE clause,

 
SAMPLE WHEN state = 'CA' THEN
0.3,
0.2
ELSE
0.5,
0.2
the SAMPLEID correspondence would be: 1 2   3 4

Example: Three Sample Sets Using SAMPLEID as a Select List Expression

The following SELECT statement provides three mutually exclusive sample sets (60 percent for sample category x, 25 percent for sample category y, and 15 percent for sample category z) from a customer table.

The results are returned in a table with three columns: cust_name, cust_addr, and SAMPLEID.

The integer in the SAMPLEID column identifies whether the row belongs to the 0.6 sample, the 0.25 sample, or the 0.15 sample.

The samples are identified as 1 through 3, in left-to-right order from the SAMPLE clause, so 0.6 is identified by 1, 0.25 by 2, and 0.15 by 3.

     SELECT cust_name, cust_addr, SAMPLEID
     FROM customer_table
     SAMPLE 0.6, 0.25, 0.15;

A partial results table might look something like the following:

cust_name cust_addr SAMPLEID
Jones Pharmaceuticals 4235 Lawler Road
Memphis, TN
USA 1
Fong Furniture 310 East Highway 5
Hong Kong 2
Subramaniam Spice Exports 455 1/2 Gandhi Lane
Hyderabad
India 3
Forrester Property Management 1 West Broadway
Syracuse, New York
USA 1
Otomo Consulting 33 Korakuen Hall
Tokyo
Japan 1
Adler Music Publishing, Ltd. 5 East 245th Street
Nashville, TN
USA  
O’Brien Metals 83 Heatherington
The Whithers
Cobblestone-on-Treads
United Kingdom 1
Irama Rice Importers 8562 Rhoma Lane
Jakarta
Indonesia 2
Abdelwahab Fine Egyptian Rugs 1723 Kulthum Avenue
Cairo
Egypt 1
Bachar Ouds 18 Rashied Diagonal
Baghdad
Iraq 1

Example: Using SAMPLEID With PERIOD Value Expressions

The following example shows how you can specify a PERIOD value expression with a SAMPLEID expression, where period_of_stay is the PERIOD value expression.

     SELECT emp_no, period_of_stay, SAMPLEID
     FROM employee
     SAMPLE 0.5,0.5;