Sample - INPUT - Analysis Parameters (Aster Database) - Teradata Warehouse Miner

Teradata® Warehouse Miner™ User Guide - Volume 2ADS Generation

Product
Teradata Warehouse Miner
Release Number
5.4.6
Published
November 2018
Language
English (United States)
Last Update
2018-12-07
dita:mapPath
gxn1538171534877.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2301
Product Category
Software
Sample > Input > Analysis Parameters tab (Aster Database)

When connected to an Aster database, the following options are available to select:
  • Sample Style
    • Basic — When this option is selected, simple random sampling without stratifying conditions is performed.
    • Stratified — When this option is selected, the available rows are divided into groups or strata based on stated conditions prior to sample of a requested size or sizes being taken. See the following screen in this section.
  • Seed — This optional value plus the task ID is used for the random seed of the pseudo-random number generator. If not specified, the task ID alone is used.
  • Basic Options
    • Single Fraction — A single fraction between 0.0 and 1.0 may be entered as text below.
    • Approximate Rows — An approximate integer number of rows to sample may be entered as text below.

      When the Stratified option is selected, the analysis parameters tab looks like the following.

      Sample > Input > Analysis Parameters > Stratified Sample Options (Aster Database)

  • Stratified Sample Options
    • Single Fraction — A single fraction between 0.0 and 1.0 may be entered as text below, which is applied equally to each stratum. For example, if 0.1 is entered and there are two strata, male and female, then approximately one-tenth of males and one-tenth of females will be included in the sample.
    • Approximate Rows — An approximate integer number of rows to sample may be entered as text below. This will be the approximate size of the overall sample, with rows derived proportionately from each stratum. For example, if 44% of all rows represent males, then approximately 44% of the resultant sample will also represent males.
    • Fraction Per Condition — For each of the Stratified Conditions in the grid below, a fraction between 0.0 and 1.0 should be specified in the Size/Fraction column of the data grid. For example, if 0.1 is entered for the stratum corresponding to males and there are 329 males, then the sample will include approximately 33 males.
    • Approximate Rows Per Condition — For each of the Stratified Conditions in the grid below, an approximate integer number of rows should be specified in the Size/Fraction column of the data grid. For example, if 10 is entered for the stratum corresponding to males, then the sample will include approximately 10 males.
    • Stratified Conditions

      Condition — A condition evaluating to true/false, such as “gender = ‘M’”. The last (but not only) condition can be “blank” or “ELSE” to define a default stratum with all observations that do not meet any explicitly defined conditions.

      Size/Fraction — A fraction between 0.0 and 1.0 or an integer

      Stratum Name — An alias for the stratum in question

The following screen provides an example of setting parameters for stratified sampling with a fraction provided for each condition, namely a sample containing one-tenth of the rows corresponding to males, and half of the rows corresponding to females.

Sample > Input > Analysis Parameters > Stratified Sample Options example (Aster Database)

The Stratified Conditions in the above example could also be specified as follows.

Sample > Input > Analysis Parameters > Stratified Sample Options > Stratified Conditions example #1 (Aster Database)

Or,

Sample > Input > Analysis Parameters > Stratified Sample Options > Stratified Conditions example #2 (Aster Database)