Sampling Example 1: Unconditional Sampling, Single Sample Rate - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example selects a sample of approximately 20% of the rows in the input table.

Input

The input table, score_category, categorizes the students (in the table students) by score in a given subject.

students
id score
1 5
2 83
3 95
4 95
5 90
6 55
7 40
8 57
9 65
10 27
... ...

This statement creates score_category from students:

CREATE MULTISET TABLE score_category AS (
  SELECT *, CASE
    WHEN score <= 80 THEN 'fair'
    WHEN score > 80 AND score < 90 THEN 'very good'
    WHEN score >= 90 THEN 'excellent'
  END AS stratum FROM students
) WITH DATA;
score_category
id score stratum
1 5 fair
2 83 very good
3 95 excellent
4 95 excellent
5 90 excellent
6 55 fair
7 40 fair
8 57 fair
9 65 fair
10 27 fair

SQL Call

SELECT * FROM Sampling (
  ON students PARTITION BY ANY
  USING
  SampleFraction (0.2)
  Seed (2)
) AS dt ORDER BY id;

Output

id score
4 95
6 55
15 22
16 19
21 5
22 25
27 44
32 3
36 15
39 50
44 70
47 79
65 53
67 29
68 30
69 18
74 7
79 71
81 13
83 81
85 79
92 32