Sample - INPUT - Analysis Parameters (Teradata Database) - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 2ADS Generation

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-03
dita:mapPath
qhj1503087326201.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2301
Product Category
Software
Sample > Input > Analysis Parameters tab (Teradata Database)

When connected to the Teradata database, the following options are available to select:
  • Sample Style
    • Basic — When this option is checked, simple random sampling without stratifying conditions is performed.
    • Stratified — When this option is checked, the available rows are divided into groups or strata based on stated conditions prior to samples of a requested size or sizes being taken.
  • Sample Options
    • Sample with Replacement — When this option is checked, each sampled row is immediately returned to the sampling pool and may therefore be selected multiple times. If multiple samples are requested with replacement, the samples are not necessarily mutually exclusive.

      When this option is not checked, each row sampled in a request is unique, and once sampled, is not replaced in the sampling pool for that request. Therefore, it is not possible to sample more rows than exist in the sampled table, and if multiple samples are requested they are mutually exclusive.

    • Sample with Randomized Allocation — When this option is checked, the requested rows are allocated across the AMPs by simulating simple random sampling, a process that can be comparatively slow.

      When this option is not checked, requested rows are allocated across the Teradata AMPs as a function of the number of rows on each AMP. This is technically not a simple random sample because it does not include all possible sample sets. It is however much faster than randomized allocation, especially for large sample sizes, and should have sufficient randomness for most applications.

    • Sizes/Fractions separated by ‘,’ (only when Sample Style is set to Basic)

      When the Sample Style is Basic, this option is used to enter a list of one or more sample sizes or fractions, separated by the list separator for the current locale. If sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be returned in each sample. If fractions are entered (e.g., .01, .02, .03), they indicate the approximate size of each sample as a fraction of the available rows in the table, and as such must not add up to more than 1.

    • Stratified Sample Options — Create a separate sample for each fraction/size (only when Sample Style is set to Stratified).

      When the Sample Style is Stratified, this option is used to create a separate sample for each sample size or fraction in the Stratified Conditions grid, as opposed to combining the first size for each condition into sample 1, the second size for each condition into sample 2, etc.

      For example, suppose there are two conditions, A and B, and the sizes associated with condition A are 1, 2 and 3, and the sizes associated with condition B are 4, 5 and 6. When this option is not checked (the default condition), 3 samples will be created, the first with 1 instance of condition A and 4 instances of condition B, the second with 2 instances of condition A and 5 instances of condition B, while the third sample will contain 3 instances of condition A and 6 instances of condition B.

      If this option is checked, there will be 6 separate samples created, with sizes 1, 2, 3, 4, 5 and 6, the first three satisfying condition A and the second three satisfying condition B. This example can be seen in the picture below.

      Sample > Input > Analysis Parameters > Stratified Sample Options (Teradata)

    • Stratified Conditions (only when Sample Style is set to Stratified)

      When the Sample Style is Stratified, this option is used to enter one or more conditions along with corresponding sample sizes or fractions. (For an example of stratified sampling, refer to Sample - Example #5 (Teradata Database)).

    • Condition — Each stratum in the sampling must be defined by a conditional expression, such as gender = ‘M’ or channel IN (‘A’, ‘B’, ‘C’), with one exception. The last (but not only) condition can be blank or ELSE to define a default stratum containing all observations that do not meet any explicitly defined conditions.
    • Sizes/Fractions — This field is used to enter sizes or fractions for one or more samples, separated by the list separator for the current locale. If sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be returned in each sample for the stratum. If fractions are entered (e.g., .01, .02, .03), they indicate the approximate size of each sample as a fraction of the available rows in the stratum, and as such must not add up to more than 1.