Sampling Example: Conditional, Variable Approximate Sample Sizes - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example applies approximate sample sizes 5, 10, and 15 to categories excellent, fair, and very good, respectively.

SQL Call

SELECT * FROM Sampling (
  ON score_category AS data PARTITION BY ANY
  ON (
    SELECT stratum, COUNT(*) AS stratum_count FROM score_category GROUP BY stratum
  ) AS SummaryTable DIMENSION
  USING
  StratumColumn ('stratum')
  Strata ('excellent', 'fair', 'very good')
  ApproxSampleSize (5, 10, 5)
  Seed (2)
) AS dt ORDER BY stratum, id;

Output

 id score stratum   
 -- ----- --------- 
 60  97.0 excellent
 70  95.0 excellent
 88  98.0 excellent
 90 100.0 excellent
  7  40.0 fair     
  8  57.0 fair     
 18  67.0 fair     
 22  25.0 fair     
 26  54.0 fair     
 34  34.0 fair     
 74   7.0 fair     
 93  67.0 fair     
 94  76.0 fair     
 96  71.0 fair     
  2  83.0 very good
 20  85.0 very good
 33  85.0 very good
 83  81.0 very good

In SummaryTable, the sum over stratum_count must equal the total population size. Otherwise, the final sample output might not approximate the target sample fractions well.

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.