Sample Examples | Vantage Analytics Library - Sample Examples - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
In the following examples, the call statement requires preceding the td_analyze stored procedure name with the database where it is installed. For example:
call val_user.td_analyze(

Additional generated SQL is received by adding the "gensqlonly = true;" parameter.

The values vary each time the command is run.

Required Parameters Only

The following example only uses the required parameters. There are some parameters that assume default values such as outputstyle = select or sampleoutputstyle = select.

Call:
Call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	samplesizes = 10;
');
SQL:
SELECT
	 "cust_id"
	,"income"
	,"age"
FROM "val_source"." customer"
SAMPLE 10
;

Results:

The results consist of a data set with three columns: cust_id, income, and age with 10 rows randomly selected from the input table.

cust_id income age
1363288 41064 55
1362722 44196 30
1362952 7633 53
1362614 18040 27
1363236 23206 64
1362596 0 15
1363346 35237 55
1363326 75890 40
1363345 54713 55
1362665 0 14

3 Samples in 1 Output Table

In this example, multiple samples are written to one output table by introducing a new column in the output table called xSampleId. This column indicates the sample where each row belongs. In this example the samplesizes parameter has the value of 1,5,10, which results in three samples. Sample one contains 1 row, sample two contains 5 rows, and sample three 10 rows which are all indicated by the xSampleId column.

The following example shows some of the many output paramaters available.

Call:
call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputstyle = table;
	outputdatabase=val_results;
	outputtablename=_val_sample_table;
	index=cust_id;
	where = cust_id > 0;
	samplesizes = 1,5,10;
');
sel * from val_results._val_sample_table;
SQL:
DROP TABLE "val_results"."_val_sample_table";
 CREATE SET TABLE "val_results"."_val_sample_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 5, 10
WHERE cust_id > 0
) WITH DATA PRIMARY INDEX ("cust_id");

Results:

This example yields an output table with columns cust_id, income, age, and xSampleId.

cust_id income age xSampleId
1362797 17912 32 3
1363316 18144 28 3
1362706 0 15 3
1362668 0 13 3
1362930 0 17 3
1363073 11821 29 3
1363072 7495 88 3
1363339 0 14 3
1362530 39250 57 1
1362638 7008 40 2
1362663 29030 44 2
1362589 30674 64 2
1363354 12621 66 2
1363037 8257 29 3
1363185 7353 19 3
1362764 25973 57 2

Stratified Sampling

This example demonstrates random sampling with stratifying conditions where rows are divided into groups or strata. This division is based on conditions defined prior to samples of a requested size being taken.

Note that in this example the separatesamplepersize option is not requested.

Call:
call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age,gender;
	conditions = {gender = ''F''},{gender = ''M''};
	conditionalsizes = {1,2,3},{4,5,6};
');
SQL:
SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,CASE 	WHEN _val_sampleid IN (1 , 4 ) THEN 1
		WHEN _val_sampleid IN (2 , 5 ) THEN 2
		WHEN _val_sampleid IN (3 , 6 ) THEN 3
	 END AS "xSampleId"
FROM
(SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,SAMPLEID AS "_val_sampleid"
FROM "val_source"."customer"
SAMPLE 
	WHEN gender = 'F' THEN 1, 2, 3
	WHEN gender = 'M' THEN 4, 5, 6
END
) AS "_valSample"
;

Results:

A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id income age gender xSampleId
1362708 15246 29 M 1
1363421 1982 28 M 1
1362801 12485 24 M 3
1363251 18597 19 M 2
1363236 23206 64 M 3
1362720 18806 19 M 3
1363346 35237 55 M 2
1362555 26598 57 M 1
1363164 24300 47 M 1
1363004 19319 24 M 3
1363426 37186 36 M 2
1363370 6700 23 M 2
1363357 19041 38 M 3
1362772 12456 31 M 2
1363275 17773 37 F 3
1363444 3171 22 F 3
1362921 19987 45 F 2
1362632 49863 55 M 3
1362843 27303 59 F 1
1362715 17750 56 F 3
1362993 20702 30 F 2

Stratified Sampling with separatesamplepersize

This example demonstrates random sampling with stratifying conditions, where rows are divided into groups or strata based on stated conditions prior to samples of a requested size being taken. Note that in this example the separatesamplepersize option is requested.

Call:
call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age,gender;
	conditions = {gender = ''F''},{gender = ''M''};
	conditionalsizes = {1,2,3},{4,5,6};
	separatesamplepersize=true;
');
SQL:
SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 
	WHEN gender = 'F' THEN 1, 2, 3
	WHEN gender = 'M' THEN 4, 5, 6
END
;

Results:

A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, 3, 4, 5 and 6.

cust_id income age gender xSampleId
1363055 34473 52 M 5
1362933 19823 55 M 4
1363422 42532 39 M 6
1363376 23883 75 M 5
1362523 0 13 F 2
1362761 13515 23 M 5
1363356 12258 39 F 3
1362967 5022 84 M 5
1362725 68933 52 M 4
1363146 8381 44 F 3
1363123 56384 35 M 6
1363492 142274 54 M 4
1363183 46870 59 M 6
1363204 23294 51 F 3
1362823 43381 48 M 6
1363083 22680 64 M 6
1362570 20562 50 F 2
1362877 9480 42 M 6
1362674 0 16 M 4
1363395 14011 29 F 1
1362821 7826 18 M 5

Multiple Tables

In this example, sampled rows are stored in output tables, one for each sample. First, a volatile table containing the input columns and an added xSampleId column is created. Each xSampleId value indicates to which sample that row belongs. Then, multiple output tables are created, again using the value of xSampleId to decide which output table that row is written to. Eventually, the volatile table is dropped automatically.

Call:
call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputdatabase = val_results;
	outputtablename = _val_sample;
	samplesizes = 1,2,3;
	sampleoutputstyle = multipletables;
	outputnames = s1,s2,s3;
	');
sel * from val_results.s1;
sel * from val_results.s2;
sel * from val_results.s3;
sel * from val_user._val_sample;
SQL:
DROP TABLE "val_user"."_val_sample";
 CREATE SET VOLATILE TABLE "val_user"."_val_sample", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 2, 3
) WITH DATA ON COMMIT PRESERVE ROWS;

DROP TABLE "val_result"."s1";
 CREATE SET TABLE "val_results"."s1", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 1
) WITH DATA;

DROP TABLE "val_results"."s2";
 CREATE SET TABLE "val_results"."s2", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 2
) WITH DATA;

DROP TABLE "val_results"."s3";
 CREATE SET TABLE "val_results"."s3", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 3
) WITH DATA;

Results:

A data set with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id income age
1362585 28093 69
cust_id income age
1363252 0 16
1363287 18449 39
cust_id income age
1362927 15795 63
1362975 36330 46
1363318 7751 41
cust_id income age xSampleId
1362927 15795 63 3
1362975 36330 46 3
1363318 7751 41 3
1362585 28093 69 1
1363287 18449 39 2
1363252 0 16 2

Multiple Views

In this example, sampled rows are first stored in a permanent output table with an added xSampleId column. Each xSampleld value indicates to which sample that row belongs. Finally, multiple views are created using the value of the xSampleId to decide which view each row belongs to.

Call:
Call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputdatabase = val_results;
	outputtablename = _val_base_table;
	samplesizes = 1,2,3;
	sampleoutputstyle = multipleviews;
	outputnames = v1,v2,v3;
	where = cust_id > 0;
	index = cust_id;
	');

sel * from val_results.v1;
sel * from val_results.v2;
sel * from val_results.v3;
sel * from val_results._val_base_table;
SQL:
DROP TABLE "val_results"."_val_base_table";
 CREATE SET TABLE "val_results"."_val_base_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 2, 3
WHERE cust_id > 0
) WITH DATA PRIMARY INDEX ("cust_id");

DROP VIEW "val_results"."v1";
 CREATE VIEW "val_results"."v1" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 1
);

DROP VIEW "val_results"."v2";
 CREATE VIEW "val_results"."v2" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 2
);

DROP VIEW "val_results"."v3";
 CREATE VIEW "val_results"."v3" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 3
);

Results:

Three views are created, one for each sample. In addition, a table with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id income age
1362562 63532 46
cust_id income age
1362955 15746 30
1362631 34394 54
cust_id income age
1363435 12307 29
1362893 0 15
1362805 4626 78
cust_id income age xSampleId
1362955 15746 30 2
1362893 0 15 3
1362805 4626 78 3
1362562 63532 46 1
1362631 34394 54 2
1363435 12307 29 3