Sample (Teradata Database) - Teradata Warehouse Miner

Teradata® Warehouse Miner™ User Guide - Volume 2ADS Generation

Product
Teradata Warehouse Miner
Release Number
5.4.6
Published
November 2018
Language
English (United States)
Last Update
2018-12-07
dita:mapPath
gxn1538171534877.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2301
Product Category
Software

A Sample element may be dragged onto the empty node under the Sample Clause, giving the following result.

Columns and/or other expressions can be moved into the folder titled Expressions, in order to achieve one of the many variations of Sample Clause available. Some of the possibilities are listed below.
  • Integer Values — One or more numeric literals or parameter with integer values may be dragged onto the Expressions folder to request a sample of the given size for each integer value.
  • Decimal Values — One or more numeric literals or parameters with decimal values between 0 and 1 and adding up to 1 may be dragged onto the Expressions folder to request a sample of the given fraction of the total rows for each decimal value entered.
  • Text literal — A single Text literal can be used to enter a comma separated list of integer or decimal values as described above.
  • Stratified conditions — Stratified sampling conditions can be requested by dragging one or more Case Conditions from the Case category of SQL elements onto the Expressions folder under the Sample element. The Then condition of each Case Conditions may consist of a SQL Element List (from the Other category of SQL element) containing the requested sample sizes or fractions for each stratum.
  • Case Conditions — Each stratum in the sampling must be defined by a conditional expression, such as gender = ‘M’ or channel IN (‘A’, ‘B’, ‘C’), with one exception. The last (but not only) condition can be a Numeric Literal or a SQL Element List (from the Other category) to define a default stratum containing all observations that do not meet any explicitly defined conditions.
  • SQL Element List of Sizes or Fractions — The sizes or fractions for one or more samples are entered into the Expressions folder of a SQL Element List. If sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be returned in each sample for the stratum. If fractions are entered (e.g. .01, .02, .03), they indicate the approximate size of each sample as a fraction of the available rows in the stratum, and as such must not add up to more than 1.

    As a first example, the following generates a sample clause of SAMPLE 10, 20, 30.

    As another example, the following generates a sample clause something like the following.

    SAMPLE
    	WHEN marital_status = '1' THEN 1, 10
    	WHEN marital_status = '2' THEN 2, 20
    	WHEN marital_status = '3' THEN 3, 30
    	ELSE 4
    In order to set the special properties available for the Sample element, click on Properties.
    • Sample with replacement — When this option is checked, each sampled row is immediately returned to the sampling pool and may therefore be selected multiple times. If multiple samples are requested with replacement, the samples are not necessarily mutually exclusive.

      When this option is not checked, each row sampled in a request is unique, and once sampled, is not replaced in the sampling pool for that request. Therefore, it is not possible to sample more rows than exist in the sampled table, and if multiple samples are requested they are mutually exclusive.

    • Sample with randomized allocation — When this option is checked, the requested rows are allocated across the AMPs by simulating simple random sampling, a process that can be comparatively slow.

      When this option is not checked, requested rows are allocated across the Teradata AMPs as a function of the number of rows on each AMP. This is technically not a simple random sample because it does not include all possible sample sets. It is, however, much faster than randomized allocation, especially for large sample sizes, and should have sufficient randomness for most applications.