Sample Syntax | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
Call td_analyze(
	'sample',
	'database = <>;
	tablename = <>; 
	columns = <>; 
	Optional Parameters;'
);

Required Parameters

columns
The list of input columns, one list for each input table. (A column list is denoted by inclusion in braces { }, separated by commas.) The columns in the first list are the columns that appear in the selected list of the Sample output table.
An alias can be associated with any of these input columns by following the name with a forward slash followed by the alias name. For example, cust_id/cid associates the alias cid with the column cust_id. Aliases for columns other than the first table are ignored.
For convenience, the keyword all is used to select all of the columns in the input table. When columns=all is entered, all columns from all of the input tables are processed. Exceptions are specified by using the optional columnstoexclude parameter.
database
The database parameter contains a list of the databases that contain the input tables to sample.
sample
The sample parameter must meet the following conditions:
  • Is required
  • Is the first parameter
  • Is always enclosed in single quotes
tablename
The input table containing the data to sample. It must reside in the input database as indicated by the database parameter.

Optional Parameters

columnstoexclude
The columns, if any, to exclude during processing when the parameter columns=all is specified. (Note that columnstoexclude can only be used when columns=all.)
conditionalsizes
Use to perform random sampling with stratifying conditions. When specified, the available rows are divided into groups or strata. This division is based on the sample size defined before the samples are taken.
Each stratum contains a comma-separated list of integer or decimal values contained in braces { }.
Each stratum is matched to a condition also enclosed in braces as in the following example:
conditions = {gender = "F"}, {gender = "M"};
conditionalsizes = {1,2,3}, {4,5,6};
The following rules apply to the conditionalsizes parameter and samplesizes parameter:
  • Literal values in a condition, such as {gender = 'F'}, must be enclosed in single quotes.
  • Either the samplesizes or conditionalsizes parameter must be specified, but not both.
  • The conditionalsizes parameter must specify either all decimal or all integer values.
  • Conditional sizes that are decimal must be between 0.0 and 1.0 exclusively.
  • A value of zero is not allowed in the conditionalsizes parameter.
  • Conditional sizes that are integer must be between 1 and 2147483646.
Refer to the description of the parameter separatesamplepersize for more information.
conditions
Refer to the description of the conditionalsizes parameter.
fallback
When true, a mirror copy of the output table is requested in the Analytics Database when outputstyle=table. See the note on fallback in the Welcome to Vantage Analytics Library section for more information.
gensqlonly
When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.
index
When a list of columns is specified, the index parameter requests that the output table contain the index columns when outputstyle=table or volatile. Either index or the noindex parameters are required with the table or volatile output styles.
indexunique
When true, requests that the output table contain a unique primary index when outputstyle=table or volatile.
lockingclause
Requests that the generated SQL contain the specified locking clause. The following is an example of a locking clause when the output style defaults to select:
LOCKING mydb.mytable FOR ACCESS;
multiset
Whether the output table can contain duplicate rows when outputstyle=table or volatile.
noindex
Requests that the output table contains no index columns (NOPI) when outputstyle=table or volatile.
outputdatabase
The database selected by the user to contain the output table or view.
For a volatile output table, do not enter a value for the output database, the value is automatically built in the USER database.
For the multipletables output option, use this parameter to indicate where the permanent output tables are built, but not the volatile table. If outputdatabase is not used with the multipletables output option, the permanent tables are built in the USER database.
outputnames
Used to name the tables or views created when using the sampleoutputstyle parameter multipletables or multipleviews style.
Required when creating multiple tables or views.
An error is returned if outputnames is missing or if the number of sample sizes do not match the number of output names. It is important to not reuse a table name as a view name or a view name as a table name. This can lead to an error that the table or view already exists.
outputstyle
The following output styles are allowed:
  • Select
  • Table
  • View
  • Volatile
If outputstyle is not specified, the function generates a SELECT statement and does not create a table or view.
outputstyle={select|table|view|volatile}
outputtablename
The output table containing the denormalized data, used only if the outputstyle is table, view, or volatile. If used, the table must reside in the database indicated by the outputdatabase parameter.
overwrite
When overwrite is set to true or not set, the output table is dropped before creating a new one.
randomizedallocation
When set to true, the requested rows are allocated across the AMPs by simulating simple random sampling, a process that can be comparatively slow.
When set to false or allowed to assume the default value, requested rows are allocated proportionally across the AMPs as a function of the number of rows on each AMP. This is not considered a simple random sample since it does not include all of the possible sample sets. This option is much faster than the randomized allocation option, especially for large sample sizes, and still result with enough of a random allocation for most applications.
The default value is to not perform randomized allocation.
sampleoutputstyle
The following sampleoutput styles are allowed:
  • select
  • table
  • multipletables
  • multipleviews
If sampleoutputstyle is not specified, you must specify the outputstyle parameter instead, otherwise a SELECT statement is performed.
sampleoutputstyle = {select|table|multipletables|multipleviews}
Using the listed sample output styles, the sampled rows may be one of the following:
  • Selected into a data set
  • Stored in a single table
  • Stored in a separate table for each sample
  • Stored in a single table with a view created for each sample
samplesizes
Used to perform simple random samples without stratifying conditions. One or more samples can be requested by supplying a column separated list of integer or decimal values, each representing a sample of the given size as in the following example:
samplesizes = 1, 2, 3
samplesizes = 0.1, 0.2, 0.3
The following rules apply when using the samplesizes parameter:
  • You can specify either the samplesizes or conditionalsizes parameter, but not both.
  • The samplesizes parameter must specify either all decimal or all integer values.
  • Sample sizes that are a decimal must be between 0.0 and 1.0 exclusively.
  • A value of zero is not allowed in the samplesizes parameter.
  • The sum of decimal sample sizes must not exceed 1.0.
  • Sample sizes that are an integer must be between 1 and 2147483646.
separatesamplepersize
Used when performing a stratified sampling. Create a separate sample for each sample size or fraction in the conditionalsizes parameter, as opposed to combining the first size for each condition into sample 1, the second size for each condition into sample 2, and so on.
For example: Two conditions, A and B, where the sizes for condition A are 1, 2, and 3 and the sizes for condition B are 4, 5, and 6. If separatesamplepersize is set to true, six separate samples are created with sizes 1, 2, 3, 4, 5, and 6. The first three samples satisfy condition A and the second three samples satisfying condition B.
When separatesamplepersize is set to false or not selected (default), three samples are created with the following sample sizes:
  • The first sample contains 1 instance of condition A and 4 instances of condition B.
  • The second sample contains 2 instances of condition A and 5 instances of condition B.
  • The third sample contains 3 instances of condition A and 6 instances of condition B.
where
Requests that the generated SQL contained the specified WHERE clause. (Do not include the leading keyword WHERE in the parameter text.)
withreplacement
When set to true, each sampled row is immediately returned to the sampling pool and can be selected multiple times. If multiple samples are requested with replacement, the samples are not necessarily mutually exclusive.
When set to false or allowed to use the default value, each row sampled in a request is unique. Once a row is sampled, it is not replaced in the sampling pool for that request. Therefore, it is not possible to sample more rows than what exists in the sampled table and if multiple samples are requested, they are mutually exclusive.
The default value is to not use replacement.