Sample Examples | Vantage Analytics Library - Sample Examples

In the following examples, the call statement requires preceding the td_analyze stored procedure name with the database where it is installed. For example:

call val_user.td_analyze(

Additional generated SQL is received by adding the "gensqlonly = true;" parameter.

The values vary each time the command is run.

Required Parameters Only

The following example only uses the required parameters. There are some parameters that assume default values such as outputstyle = select or sampleoutputstyle = select.

Call:

Call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	samplesizes = 10;
');

SQL:

SELECT
	 "cust_id"
	,"income"
	,"age"
FROM "val_source"." customer"
SAMPLE 10
;

Results:

The results consist of a data set with three columns: cust_id, income, and age with 10 rows randomly selected from the input table.

cust_id	income	age
1363288	41064	55
1362722	44196	30
1362952	7633	53
1362614	18040	27
1363236	23206	64
1362596	0	15
1363346	35237	55
1363326	75890	40
1363345	54713	55
1362665	0	14

3 Samples in 1 Output Table

In this example, multiple samples are written to one output table by introducing a new column in the output table called xSampleId. This column indicates the sample where each row belongs. In this example the samplesizes parameter has the value of 1,5,10, which results in three samples. Sample one contains 1 row, sample two contains 5 rows, and sample three 10 rows which are all indicated by the xSampleId column.

The following example shows some of the many output paramaters available.

Call:

call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputstyle = table;
	outputdatabase=val_results;
	outputtablename=_val_sample_table;
	index=cust_id;
	where = cust_id > 0;
	samplesizes = 1,5,10;
');
sel * from val_results._val_sample_table;

SQL:

DROP TABLE "val_results"."_val_sample_table";
 CREATE SET TABLE "val_results"."_val_sample_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 5, 10
WHERE cust_id > 0
) WITH DATA PRIMARY INDEX ("cust_id");

Results:

This example yields an output table with columns cust_id, income, age, and xSampleId.

cust_id	income	age	xSampleId
1362797	17912	32	3
1363316	18144	28	3
1362706	0	15	3
1362668	0	13	3
1362930	0	17	3
1363073	11821	29	3
1363072	7495	88	3
1363339	0	14	3
1362530	39250	57	1
1362638	7008	40	2
1362663	29030	44	2
1362589	30674	64	2
1363354	12621	66	2
1363037	8257	29	3
1363185	7353	19	3
1362764	25973	57	2

Stratified Sampling

This example demonstrates random sampling with stratifying conditions where rows are divided into groups or strata. This division is based on conditions defined prior to samples of a requested size being taken.

Note that in this example the separatesamplepersize option is not requested.

Call:

call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age,gender;
	conditions = {gender = ''F''},{gender = ''M''};
	conditionalsizes = {1,2,3},{4,5,6};
');

SQL:

SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,CASE 	WHEN _val_sampleid IN (1 , 4 ) THEN 1
		WHEN _val_sampleid IN (2 , 5 ) THEN 2
		WHEN _val_sampleid IN (3 , 6 ) THEN 3
	 END AS "xSampleId"
FROM
(SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,SAMPLEID AS "_val_sampleid"
FROM "val_source"."customer"
SAMPLE 
	WHEN gender = 'F' THEN 1, 2, 3
	WHEN gender = 'M' THEN 4, 5, 6
END
) AS "_valSample"
;

Results:

A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id	income	age	gender	xSampleId
1362708	15246	29	M	1
1363421	1982	28	M	1
1362801	12485	24	M	3
1363251	18597	19	M	2
1363236	23206	64	M	3
1362720	18806	19	M	3
1363346	35237	55	M	2
1362555	26598	57	M	1
1363164	24300	47	M	1
1363004	19319	24	M	3
1363426	37186	36	M	2
1363370	6700	23	M	2
1363357	19041	38	M	3
1362772	12456	31	M	2
1363275	17773	37	F	3
1363444	3171	22	F	3
1362921	19987	45	F	2
1362632	49863	55	M	3
1362843	27303	59	F	1
1362715	17750	56	F	3
1362993	20702	30	F	2

Stratified Sampling with separatesamplepersize

This example demonstrates random sampling with stratifying conditions, where rows are divided into groups or strata based on stated conditions prior to samples of a requested size being taken. Note that in this example the separatesamplepersize option is requested.

Call:

call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age,gender;
	conditions = {gender = ''F''},{gender = ''M''};
	conditionalsizes = {1,2,3},{4,5,6};
	separatesamplepersize=true;
');

SQL:

SELECT
	 "cust_id"
	,"income"
	,"age"
	,"gender"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 
	WHEN gender = 'F' THEN 1, 2, 3
	WHEN gender = 'M' THEN 4, 5, 6
END
;

Results:

A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, 3, 4, 5 and 6.

cust_id	income	age	gender	xSampleId
1363055	34473	52	M	5
1362933	19823	55	M	4
1363422	42532	39	M	6
1363376	23883	75	M	5
1362523	0	13	F	2
1362761	13515	23	M	5
1363356	12258	39	F	3
1362967	5022	84	M	5
1362725	68933	52	M	4
1363146	8381	44	F	3
1363123	56384	35	M	6
1363492	142274	54	M	4
1363183	46870	59	M	6
1363204	23294	51	F	3
1362823	43381	48	M	6
1363083	22680	64	M	6
1362570	20562	50	F	2
1362877	9480	42	M	6
1362674	0	16	M	4
1363395	14011	29	F	1
1362821	7826	18	M	5

Multiple Tables

In this example, sampled rows are stored in output tables, one for each sample. First, a volatile table containing the input columns and an added xSampleId column is created. Each xSampleId value indicates to which sample that row belongs. Then, multiple output tables are created, again using the value of xSampleId to decide which output table that row is written to. Eventually, the volatile table is dropped automatically.

Call:

call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputdatabase = val_results;
	outputtablename = _val_sample;
	samplesizes = 1,2,3;
	sampleoutputstyle = multipletables;
	outputnames = s1,s2,s3;
	');
sel * from val_results.s1;
sel * from val_results.s2;
sel * from val_results.s3;
sel * from val_user._val_sample;

SQL:

DROP TABLE "val_user"."_val_sample";
 CREATE SET VOLATILE TABLE "val_user"."_val_sample", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 2, 3
) WITH DATA ON COMMIT PRESERVE ROWS;

DROP TABLE "val_result"."s1";
 CREATE SET TABLE "val_results"."s1", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 1
) WITH DATA;

DROP TABLE "val_results"."s2";
 CREATE SET TABLE "val_results"."s2", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 2
) WITH DATA;

DROP TABLE "val_results"."s3";
 CREATE SET TABLE "val_results"."s3", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_user"."_val_sample"
WHERE "xSampleId" = 3
) WITH DATA;

Results:

A data set with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id	income	age
1362585	28093	69

cust_id	income	age
1363252	0	16
1363287	18449	39

cust_id	income	age
1362927	15795	63
1362975	36330	46
1363318	7751	41

cust_id	income	age	xSampleId
1362927	15795	63	3
1362975	36330	46	3
1363318	7751	41	3
1362585	28093	69	1
1363287	18449	39	2
1363252	0	16	2

Multiple Views

In this example, sampled rows are first stored in a permanent output table with an added xSampleId column. Each xSampleld value indicates to which sample that row belongs. Finally, multiple views are created using the value of the xSampleId to decide which view each row belongs to.

Call:

Call td_analyze('sample','
	database = val_source;
	tablename = customer;
	columns = cust_id,income,age;
	outputdatabase = val_results;
	outputtablename = _val_base_table;
	samplesizes = 1,2,3;
	sampleoutputstyle = multipleviews;
	outputnames = v1,v2,v3;
	where = cust_id > 0;
	index = cust_id;
	');

sel * from val_results.v1;
sel * from val_results.v2;
sel * from val_results.v3;
sel * from val_results._val_base_table;

SQL:

DROP TABLE "val_results"."_val_base_table";
 CREATE SET TABLE "val_results"."_val_base_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS (
SELECT
	 "cust_id"
	,"income"
	,"age"
	,SAMPLEID AS "xSampleId"
FROM "val_source"."customer"
SAMPLE 1, 2, 3
WHERE cust_id > 0
) WITH DATA PRIMARY INDEX ("cust_id");

DROP VIEW "val_results"."v1";
 CREATE VIEW "val_results"."v1" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 1
);

DROP VIEW "val_results"."v2";
 CREATE VIEW "val_results"."v2" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 2
);

DROP VIEW "val_results"."v3";
 CREATE VIEW "val_results"."v3" AS (
SELECT
	 "cust_id"
	,"income"
	,"age"

FROM "val_results"."_val_base_table"
WHERE "xSampleId" = 3
);

Results:

Three views are created, one for each sample. In addition, a table with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.

cust_id	income	age
1362562	63532	46

cust_id	income	age
1362955	15746	30
1362631	34394	54

cust_id	income	age
1363435	12307	29
1362893	0	15
1362805	4626	78

cust_id	income	age	xSampleId
1362955	15746	30	2
1362893	0	15	3
1362805	4626	78	3
1362562	63532	46	1
1362631	34394	54	2
1363435	12307	29	3

Sample Examples | Vantage Analytics Library - Sample Examples - Vantage Analytics Library

Vantage Analytics Library User Guide

Required Parameters Only

3 Samples in 1 Output Table

Stratified Sampling

Stratified Sampling with separatesamplepersize

Multiple Tables

Multiple Views