call val_user.td_analyze(
Additional generated SQL is received by adding the "gensqlonly = true;" parameter.
The values vary each time the command is run.
Required Parameters Only
The following example only uses the required parameters. There are some parameters that assume default values such as outputstyle = select or sampleoutputstyle = select.
Call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age; samplesizes = 10; ');
SELECT "cust_id" ,"income" ,"age" FROM "val_source"." customer" SAMPLE 10 ;
Results:
The results consist of a data set with three columns: cust_id, income, and age with 10 rows randomly selected from the input table.
cust_id | income | age |
---|---|---|
1363288 | 41064 | 55 |
1362722 | 44196 | 30 |
1362952 | 7633 | 53 |
1362614 | 18040 | 27 |
1363236 | 23206 | 64 |
1362596 | 0 | 15 |
1363346 | 35237 | 55 |
1363326 | 75890 | 40 |
1363345 | 54713 | 55 |
1362665 | 0 | 14 |
3 Samples in 1 Output Table
In this example, multiple samples are written to one output table by introducing a new column in the output table called xSampleId. This column indicates the sample where each row belongs. In this example the samplesizes parameter has the value of 1,5,10, which results in three samples. Sample one contains 1 row, sample two contains 5 rows, and sample three 10 rows which are all indicated by the xSampleId column.
The following example shows some of the many output paramaters available.
call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age; outputstyle = table; outputdatabase=val_results; outputtablename=_val_sample_table; index=cust_id; where = cust_id > 0; samplesizes = 1,5,10; '); sel * from val_results._val_sample_table;
DROP TABLE "val_results"."_val_sample_table"; CREATE SET TABLE "val_results"."_val_sample_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" ,SAMPLEID AS "xSampleId" FROM "val_source"."customer" SAMPLE 1, 5, 10 WHERE cust_id > 0 ) WITH DATA PRIMARY INDEX ("cust_id");
Results:
This example yields an output table with columns cust_id, income, age, and xSampleId.
cust_id | income | age | xSampleId |
---|---|---|---|
1362797 | 17912 | 32 | 3 |
1363316 | 18144 | 28 | 3 |
1362706 | 0 | 15 | 3 |
1362668 | 0 | 13 | 3 |
1362930 | 0 | 17 | 3 |
1363073 | 11821 | 29 | 3 |
1363072 | 7495 | 88 | 3 |
1363339 | 0 | 14 | 3 |
1362530 | 39250 | 57 | 1 |
1362638 | 7008 | 40 | 2 |
1362663 | 29030 | 44 | 2 |
1362589 | 30674 | 64 | 2 |
1363354 | 12621 | 66 | 2 |
1363037 | 8257 | 29 | 3 |
1363185 | 7353 | 19 | 3 |
1362764 | 25973 | 57 | 2 |
Stratified Sampling
This example demonstrates random sampling with stratifying conditions where rows are divided into groups or strata. This division is based on conditions defined prior to samples of a requested size being taken.
Note that in this example the separatesamplepersize option is not requested.
call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age,gender; conditions = {gender = ''F''},{gender = ''M''}; conditionalsizes = {1,2,3},{4,5,6}; ');
SELECT "cust_id" ,"income" ,"age" ,"gender" ,CASE WHEN _val_sampleid IN (1 , 4 ) THEN 1 WHEN _val_sampleid IN (2 , 5 ) THEN 2 WHEN _val_sampleid IN (3 , 6 ) THEN 3 END AS "xSampleId" FROM (SELECT "cust_id" ,"income" ,"age" ,"gender" ,SAMPLEID AS "_val_sampleid" FROM "val_source"."customer" SAMPLE WHEN gender = 'F' THEN 1, 2, 3 WHEN gender = 'M' THEN 4, 5, 6 END ) AS "_valSample" ;
Results:
A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.
cust_id | income | age | gender | xSampleId |
---|---|---|---|---|
1362708 | 15246 | 29 | M | 1 |
1363421 | 1982 | 28 | M | 1 |
1362801 | 12485 | 24 | M | 3 |
1363251 | 18597 | 19 | M | 2 |
1363236 | 23206 | 64 | M | 3 |
1362720 | 18806 | 19 | M | 3 |
1363346 | 35237 | 55 | M | 2 |
1362555 | 26598 | 57 | M | 1 |
1363164 | 24300 | 47 | M | 1 |
1363004 | 19319 | 24 | M | 3 |
1363426 | 37186 | 36 | M | 2 |
1363370 | 6700 | 23 | M | 2 |
1363357 | 19041 | 38 | M | 3 |
1362772 | 12456 | 31 | M | 2 |
1363275 | 17773 | 37 | F | 3 |
1363444 | 3171 | 22 | F | 3 |
1362921 | 19987 | 45 | F | 2 |
1362632 | 49863 | 55 | M | 3 |
1362843 | 27303 | 59 | F | 1 |
1362715 | 17750 | 56 | F | 3 |
1362993 | 20702 | 30 | F | 2 |
Stratified Sampling with separatesamplepersize
This example demonstrates random sampling with stratifying conditions, where rows are divided into groups or strata based on stated conditions prior to samples of a requested size being taken. Note that in this example the separatesamplepersize option is requested.
call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age,gender; conditions = {gender = ''F''},{gender = ''M''}; conditionalsizes = {1,2,3},{4,5,6}; separatesamplepersize=true; ');
SELECT "cust_id" ,"income" ,"age" ,"gender" ,SAMPLEID AS "xSampleId" FROM "val_source"."customer" SAMPLE WHEN gender = 'F' THEN 1, 2, 3 WHEN gender = 'M' THEN 4, 5, 6 END ;
Results:
A data set with 21 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, 3, 4, 5 and 6.
cust_id | income | age | gender | xSampleId |
---|---|---|---|---|
1363055 | 34473 | 52 | M | 5 |
1362933 | 19823 | 55 | M | 4 |
1363422 | 42532 | 39 | M | 6 |
1363376 | 23883 | 75 | M | 5 |
1362523 | 0 | 13 | F | 2 |
1362761 | 13515 | 23 | M | 5 |
1363356 | 12258 | 39 | F | 3 |
1362967 | 5022 | 84 | M | 5 |
1362725 | 68933 | 52 | M | 4 |
1363146 | 8381 | 44 | F | 3 |
1363123 | 56384 | 35 | M | 6 |
1363492 | 142274 | 54 | M | 4 |
1363183 | 46870 | 59 | M | 6 |
1363204 | 23294 | 51 | F | 3 |
1362823 | 43381 | 48 | M | 6 |
1363083 | 22680 | 64 | M | 6 |
1362570 | 20562 | 50 | F | 2 |
1362877 | 9480 | 42 | M | 6 |
1362674 | 0 | 16 | M | 4 |
1363395 | 14011 | 29 | F | 1 |
1362821 | 7826 | 18 | M | 5 |
Multiple Tables
In this example, sampled rows are stored in output tables, one for each sample. First, a volatile table containing the input columns and an added xSampleId column is created. Each xSampleId value indicates to which sample that row belongs. Then, multiple output tables are created, again using the value of xSampleId to decide which output table that row is written to. Eventually, the volatile table is dropped automatically.
call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age; outputdatabase = val_results; outputtablename = _val_sample; samplesizes = 1,2,3; sampleoutputstyle = multipletables; outputnames = s1,s2,s3; '); sel * from val_results.s1; sel * from val_results.s2; sel * from val_results.s3; sel * from val_user._val_sample;
DROP TABLE "val_user"."_val_sample"; CREATE SET VOLATILE TABLE "val_user"."_val_sample", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" ,SAMPLEID AS "xSampleId" FROM "val_source"."customer" SAMPLE 1, 2, 3 ) WITH DATA ON COMMIT PRESERVE ROWS; DROP TABLE "val_result"."s1"; CREATE SET TABLE "val_results"."s1", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_user"."_val_sample" WHERE "xSampleId" = 1 ) WITH DATA; DROP TABLE "val_results"."s2"; CREATE SET TABLE "val_results"."s2", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_user"."_val_sample" WHERE "xSampleId" = 2 ) WITH DATA; DROP TABLE "val_results"."s3"; CREATE SET TABLE "val_results"."s3", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_user"."_val_sample" WHERE "xSampleId" = 3 ) WITH DATA;
Results:
A data set with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.
cust_id | income | age |
---|---|---|
1362585 | 28093 | 69 |
cust_id | income | age |
---|---|---|
1363252 | 0 | 16 |
1363287 | 18449 | 39 |
cust_id | income | age |
---|---|---|
1362927 | 15795 | 63 |
1362975 | 36330 | 46 |
1363318 | 7751 | 41 |
cust_id | income | age | xSampleId |
---|---|---|---|
1362927 | 15795 | 63 | 3 |
1362975 | 36330 | 46 | 3 |
1363318 | 7751 | 41 | 3 |
1362585 | 28093 | 69 | 1 |
1363287 | 18449 | 39 | 2 |
1363252 | 0 | 16 | 2 |
Multiple Views
In this example, sampled rows are first stored in a permanent output table with an added xSampleId column. Each xSampleld value indicates to which sample that row belongs. Finally, multiple views are created using the value of the xSampleId to decide which view each row belongs to.
Call td_analyze('sample',' database = val_source; tablename = customer; columns = cust_id,income,age; outputdatabase = val_results; outputtablename = _val_base_table; samplesizes = 1,2,3; sampleoutputstyle = multipleviews; outputnames = v1,v2,v3; where = cust_id > 0; index = cust_id; '); sel * from val_results.v1; sel * from val_results.v2; sel * from val_results.v3; sel * from val_results._val_base_table;
DROP TABLE "val_results"."_val_base_table"; CREATE SET TABLE "val_results"."_val_base_table", NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS ( SELECT "cust_id" ,"income" ,"age" ,SAMPLEID AS "xSampleId" FROM "val_source"."customer" SAMPLE 1, 2, 3 WHERE cust_id > 0 ) WITH DATA PRIMARY INDEX ("cust_id"); DROP VIEW "val_results"."v1"; CREATE VIEW "val_results"."v1" AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_results"."_val_base_table" WHERE "xSampleId" = 1 ); DROP VIEW "val_results"."v2"; CREATE VIEW "val_results"."v2" AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_results"."_val_base_table" WHERE "xSampleId" = 2 ); DROP VIEW "val_results"."v3"; CREATE VIEW "val_results"."v3" AS ( SELECT "cust_id" ,"income" ,"age" FROM "val_results"."_val_base_table" WHERE "xSampleId" = 3 );
Results:
Three views are created, one for each sample. In addition, a table with 6 rows and the added xSampleId column is created. This added column indicates the sample identifier 1, 2, or 3.
cust_id | income | age |
---|---|---|
1362562 | 63532 | 46 |
cust_id | income | age |
---|---|---|
1362955 | 15746 | 30 |
1362631 | 34394 | 54 |
cust_id | income | age |
---|---|---|
1363435 | 12307 | 29 |
1362893 | 0 | 15 |
1362805 | 4626 | 78 |
cust_id | income | age | xSampleId |
---|---|---|---|
1362955 | 15746 | 30 | 2 |
1362893 | 0 | 15 | 3 |
1362805 | 4626 | 78 | 3 |
1362562 | 63532 | 46 | 1 |
1362631 | 34394 | 54 | 2 |
1363435 | 12307 | 29 | 3 |