About DATASET Type Usage

Teradata Vantage™ DATASET Data Type

brand
Software
Teradata Vantage
prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1198-162K

User-defined Functions

Use the CREATE/REPLACE FUNCTION statement to create a user-defined function (UDF) containing one or more parameters, or a DATASET return type of any of supported storage format. The parameters and/or return types are supported on scalar, aggregate, and table UDFs, and SQL UDFs. When the return type is specified as one of these types for SQL UDF, the RETURN expression may be an SQL statement evaluating to one type.

The DATASET type as a parameter to a UDF is supported for LANGUAGE C, CPP, and JAVA, but LANGUAGE R is not supported.

The following action occurs on the Data Dictionary CREATE FUNCTION statement for the DATASET type, and is in addition to dictionary updates that normally occur on a CREATE FUNCTION statement:

  • The row inserted to DBC.TVFields to record metadata information about the DATASET field indicates it is a DATASET type. It shares some entries with the UDTs. The FieldType is 'DT,' and the TypeId corresponds to the static type ID assigned to the DATASET type.

Table Operators

The DATASET type is supported in C language and Java language user-defined table operators. The metadata is passed to the table operator contract function using an external type code such as DATASET_AVRO_DT or DATASET_CSV_DT. The type codes for dtype_en include the following. See sqltypes_td.h for the complete definition.

typedef enum dtype_en
{
        UNDEF_DT=0,
        CHAR_DT=1,
        VARCHAR_DT=2,
        BYTE_DT=3,
        VARBYTE_DT=4,
        GRAPHIC_DT=5,
        VARGRAPHIC_DT=6, 
        BYTEINT_DT=7,
        SMALLINT_DT=8,
        INTEGER_DT=9,
        BIGINT_DT = 36,
        REAL_DT=10,
        DECIMAL1_DT=11,'
        DECIMAL2_DT=12,
        DECIMAL4_DT=13,
        DECIMAL8_DT=14, 
        DECIMAL16_DT=37,  
        DATE_DT=15,
        TIME_DT=16,
        TIMESTAMP_DT=17,
        INTERVAL_YEAR_DT=18,
        INTERVAL_YTM_DT=19,
        INTERVAL_MONTH_DT=20,
        INTERVAL_DAY_DT=21,
        INTERVAL_DTH_DT=22,
        INTERVAL_DTM_DT=23,
        INTERVAL_DTS_DT=24,
        INTERVAL_HOUR_DT=25,
        INTERVAL_HTM_DT=26,
        INTERVAL_HTS_DT=27,
        INTERVAL_MINUTE_DT=28,
        INTERVAL_MTS_DT=29,
        INTERVAL_SECOND_DT=30,
        TIME_WTZ_DT=31,
        TIMESTAMP_WTZ_DT=32,
        BLOB_REFERENCE_DT=33,
        CLOB_REFERENCE_DT=34, 
        UDT_DT=35,
        /* The 8 byte integer type (BIGINT_DT) and 
         * the 16 byte decimal type (DECIMAL16_DT) 
         * are located above and have the following
         * values:
         *
         * BIGINT_DT=36    
         * DECIMAL16_DT=37 
         */                   
        NUMBER_DT=38, 
        PERIOD_DT=39, 
        JSON_DT=40, 
        DATASET_AVRO_DT=41,
        ST_GEOMETRY_DT=42,
        MBR_DT=43,
        MBB_DT=44,
        ARRAY_DT=45,
        XML_DT = 46, 
	    DATASET_CSV_DT=47,
        FNC_DATATYPESETSIZE=48
} dtype_en;

For AVRO, the complex types map to the following base type, DATASET_AVRO_DT → BLOB_REFERENCE_DT. For CSV, the complex types map to the following base type, DATASET_CSV_DT → CLOB_REFERENCE_DT

When input data values are sent to a table operator, the data is transferred in the current default transform. Each possible transform type populates the UDT_BaseInfo_t.transform_info structure, as shown in the following table:

Transform Type Datatype Column Size.length
TD_DATASET_AVRO_VARBYTE VARBYTE_DT <name of the column> The data size
TD_DATASET_AVRO_BLOB BLOB_REFERENCE_DT <name of the column> The data size
For CSV,the transform types populate the UDT_BaseInfo_t.transform_info structure, as shown in the following table:
Transform Type Datatype Column ... Charset ... Size.length
TD_CSV_CLOB CLOB_REFERENCE_DT <name of the column>       The data size

There is no "transforms off" functionality. The UDT_BaseInfo_t structure's udt_indicator member value identifies the DATASET storage formats:

10==DATASET STORAGE FORMAT AVRO

11==DATASET STORAGE FORMAT CSV

For CSV, the UDT_BaseInfo_t's charset field determines the character set as either LATIN or UNICODE.

External Stored Procedures

The CREATE/REPLACE PROCEDURE statement was extended to create an external stored procedure containing one or more parameters that are DATASET types of any of the supported storage formats. Use these types to define the IN, OUT, or INOUT parameters.

The DATASET type as an IN, OUT, or INOUT parameter to an external stored procedure is supported for LANGUAGE C, CPP, or JAVA. The LANGUAGE R option is not supported.

The following action occurs on the Data Dictionary on a CREATE/REPLACE PROCEDURE statement for the DATASET type. The change is in addition to dictionary updates that normally occur on a CREATE/REPLACE PROCEDURE statement.
  • The row normally inserted to DBC.TVFields to record metadata information about the DATASET field was enhanced to indicate that it is a DATASET type. It shares some entries with the UDTs. The FieldType is 'DT', and the TypeId corresponds to the static type ID assigned to the DATASET type.