Call td_analyze( 'sample', 'database = <>; tablename = <>; columns = <>; Optional Parameters;' );
Required Parameters
- columns
- The list of input columns, one list for each input table. (A column list is denoted by inclusion in braces { }, separated by commas.) The columns in the first list are the columns that appear in the selected list of the Sample output table.
- database
- The database parameter contains a list of the databases that contain the input tables to sample.
- sample
- The sample parameter must meet the following conditions:
- Is required
- Is the first parameter
- Is always enclosed in single quotes
- tablename
- The input table containing the data to sample. It must reside in the input database as indicated by the database parameter.
Optional Parameters
- columnstoexclude
- The columns, if any, to exclude during processing when the parameter columns=all is specified. (Note that columnstoexclude can only be used when columns=all.)
- conditionalsizes
- Use to perform random sampling with stratifying conditions. When specified, the available rows are divided into groups or strata. This division is based on the sample size defined before the samples are taken.
- conditions
- Refer to the description of the conditionalsizes parameter.
- fallback
- When true, a mirror copy of the output table is requested in the Analytics Database when outputstyle=table. See the note on fallback in the Welcome to Vantage Analytics Library section for more information.
- gensqlonly
- When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.
- index
- When a list of columns is specified, the index parameter requests that the output table contain the index columns when outputstyle=table or volatile. Either index or the noindex parameters are required with the table or volatile output styles.
- indexunique
- When true, requests that the output table contain a unique primary index when outputstyle=table or volatile.
- lockingclause
- Requests that the generated SQL contain the specified locking clause. The following is an example of a locking clause when the output style defaults to select:
LOCKING mydb.mytable FOR ACCESS;
- multiset
- Whether the output table can contain duplicate rows when outputstyle=table or volatile.
- noindex
- Requests that the output table contains no index columns (NOPI) when outputstyle=table or volatile.
- outputdatabase
- The database selected by the user to contain the output table or view.
- outputnames
- Used to name the tables or views created when using the sampleoutputstyle parameter multipletables or multipleviews style.
- outputstyle
- The following output styles are allowed:
- Select
- Table
- View
- Volatile
- outputtablename
- The output table containing the denormalized data, used only if the outputstyle is table, view, or volatile. If used, the table must reside in the database indicated by the outputdatabase parameter.
- overwrite
- When overwrite is set to true or not set, the output table is dropped before creating a new one.
- randomizedallocation
- When set to true, the requested rows are allocated across the AMPs by simulating simple random sampling, a process that can be comparatively slow.
- sampleoutputstyle
- The following sampleoutput styles are allowed:
- select
- table
- multipletables
- multipleviews
- samplesizes
- Used to perform simple random samples without stratifying conditions. One or more samples can be requested by supplying a column separated list of integer or decimal values, each representing a sample of the given size as in the following example:
samplesizes = 1, 2, 3 samplesizes = 0.1, 0.2, 0.3
- separatesamplepersize
- Used when performing a stratified sampling. Create a separate sample for each sample size or fraction in the conditionalsizes parameter, as opposed to combining the first size for each condition into sample 1, the second size for each condition into sample 2, and so on.
- where
- Requests that the generated SQL contained the specified WHERE clause. (Do not include the leading keyword WHERE in the parameter text.)
- withreplacement
- When set to true, each sampled row is immediately returned to the sampling pool and can be selected multiple times. If multiple samples are requested with replacement, the samples are not necessarily mutually exclusive.