Sampling Example: Unconditional, Single Sample Rate - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example selects a sample of approximately 20% of the rows in the input table.

Input

The input table, score_category, categorizes the students (in the table students) by score in a given subject.

students
id score
1 5
2 83
3 95
4 95
5 90
6 55
7 40
8 57
9 65
10 27
... ...

This statement creates score_category from students:

CREATE MULTISET TABLE score_category AS (
  SELECT *, CASE
    WHEN score <= 80 THEN 'fair'
    WHEN score > 80 AND score < 90 THEN 'very good'
    WHEN score >= 90 THEN 'excellent'
  END AS stratum FROM students
) WITH DATA;
score_category
id score stratum
1 5 fair
2 83 very good
3 95 excellent
4 95 excellent
5 90 excellent
6 55 fair
7 40 fair
8 57 fair
9 65 fair
10 27 fair

SQL Call

SELECT * FROM Sampling(
  ON students PARTITION BY any
  USING
  SampleFraction(0.2)
  Seed(2)
) AS dt ORDER BY id;

Output

 id score 
 -- ----- 
  4  95.0
 10  27.0
 16  19.0
 23  14.0
 26  54.0
 27  44.0
 28  90.0
 33  85.0
 35  94.0
 42  39.0
 43  49.0
 57  75.0
 60  97.0
 71  59.0
 86   6.0
 87  18.0
 93  67.0

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.