The Sample analysis function randomly selects rows from a table or view producing one or more samples based on a specified number of rows or a fraction of the total number of rows. The sampled rows can be stored in the following ways:
- Single table
- Separate table for each sample
- Single table with a view created for each sample
Options are provided for sampling with or without the replacement of rows, randomized or proportional allocation by AMP, and stratified or simple random sampling and are described in the following table:
| Option |
Description |
| With or without replacement of rows |
- Without replacement
Sampling of rows is performed without row replacement by default. Each sampled row in a request is unique. Once a row is sampled, it is not replaced in the sampling pool for that request. Therefore, it is not possible to sample more rows than what exists in the sampled table. If multiple samples are requested, they are mutually exclusive.
- With replacement
Each sampled row is immediately returned to the sampling pool and can be selected multiple times. If multiple samples are requested and with replacements is selected, the samples are not necessarily mutually exclusive.
|
| Randomized or proportional allocation by AMP |
- Proportional allocation
Sampling of rows is performed with proportional allocation by default. Requested rows are allocated across the AMPs as a function of the number of rows on each AMP. This is not considered a simple random sample since it does not include all of the possible sample sets. This option is much faster than the randomized allocation option, especially for large sample sizes, and still result with enough of a random allocation for most applications.
- Randomized allocation
Request rows are allocated across the AMPs by simulating simple random sampling, a process that can be comparatively slower than proportional allocation.
|
| Stratified or simple random sampling |
- Simple
Sampling of rows is performed with simple random sampling by default. Each possible set of the requested sample size has an equal probability of being selected (subject to the limitations of proportional allocation discussed previously).
- Stratified
Available rows are divided into groups or strata. This division is based on conditions defined prior to samples of a requested size being taken.
|
The Sample analysis function is defined by specifying the parameters of the table and columns to analyze. Each Sample example contains the td_analyze call statement, the generated SQL, and expected results.