1.1 - JSON Function Descriptor - Teradata Vantage

Teradata Vantageā„¢ User Guide

Teradata Vantage
May 2020
User Guide
A JSON (JavaScript Object Notation) function descriptor is a JSON file that ML Engine uses for function metadata processing.

Each of the sections in the table below is described in more details in the subsequent tables.

Major Sections of JSON Descriptor

Section Description
Header Specifies function name, version, and type information.
Input_tables Specifies function ON clause fields for non-driver functions. Input tables for non-driver functions are specified in this section.
Argument_clauses Specifies function argument fields. Input tables for driver functions and all output tables are specified by argument clauses in this section.

Header Section Fields

Required fields are marked with an asterisk (*).

Field Type Description
* json_schema_major_version string Major version of JSON schema. Set to "1".
* json_schema_minor_version string Minor version of JSON schema. Set to "2".
* json_content_version string JSON content version. Set to "1".
* function_name string Name of function class file.
function_version string Version of function.
* function_type string Specifies function type ("driver" or "non-driver"). See Compatibility Considerations for UDFs for an explanation of driver and non-driver functions.
short_description string Short description of function.
long_description string Long description of function.

Input_tables Section Fields

Field Type Description
* name string Specifies ON clause alias. If no alias, use "input" as alias.
* datatype string Set to "TABLE_ALIAS" for each ON clause.
requiredInputKind list of string Partition information for how ON clause is specified in the syntax. It can be a combination of PartitionByKey, PartitionByAny, or Dimension. If not specified, PartitionByAny is used. Examples are:

ON tablename PARTITION BY column-name


ON tablename DIMENSION

partitionByOne boolean Specifies whether ON clause accepts PartitionByOne. For this to be true, requiredInputKind must be PartitionByKey.

For example:

ON tablename PARTITION BY 1

partitionByOneInclusive boolean Specifies whether ON clause accepts both PartitionByOne and PartitionByKey. For this to be True, PartitionByOne must also be true.
isOrdered boolean Specifies whether ON clause requires ORDER BY clause. If False, ORDER BY is optional.
isRequired boolean Specifies whether ON clause is required.
description string Description of ON clause.

Argument_clauses Section Fields

Field Type Description
* name string Specifies argument name.
* datatype string Specifies data type of argument value; one of these:
  • "DOUBLE"
  • "LONG"
  • "STRING"
  • "TABLE_NAME" (Used for names of input or output tables for driver functions. Identify output tables by setting the isOutputTable field to True.)
  • "COLUMN_NAMES" (Used for names of columns in input tables for driver functions.)
  • "COLUMNS" (Used for names of columns in input tables for non-driver functions.)
datatype string  
isRequired boolean Specifies whether argument clause is required or optional.
* defaultValue Boolean, numeric, or string depending on the value of the data type. Specifies default value of argument (value for function to use if the user omits argument). Specify only if isRequired is set to false.
permittedValues list of string Specifies permitted values of argument clause. For example:
"permittedValues": [
description string Description of argument clause.
* isOutputTable boolean Specifies whether argument clause accepts database table as output. For this value to be true, data type must be set to "TABLE_NAME".

JSON Descriptor Example: GMMFit Function

 "json_schema_major_version": "1",
 "json_schema_minor_version": "2",
 "json_content_version": "1",
 "function_name": "GMMFit",
 "function_version": "1.2",
 "function_type": "driver",
 "short_description"; "Fits a Gaussian Mixture Model to data.",
 "long_description": "Clusters data using a Gaussian Mixture Model or a Dirichlet Process Gaussian Mixture Model.",
   "isOrdered": false,
   "partitionByOne": true,
   "name": "init_params",
   "isRequired": true,
   "description": "Contains initial values for the cluster weights, means, and covariances.",
   "datatype": "TABLE_ALIAS"
   "isOutputTable": false,
   "isRequired": true,
   "description": "Specifies the name of the table that contains the input data to be clustered.",
   "datatype": "TABLE_NAME"
   "isOutputTable": true,
   "isRequired": true,
   "description": "Specifies the name of the output table to which the function outputs cluster information.",
   "datatype": "TABLE_NAME"
   "defaultValue": 20,
   "name": "MaxClusternum",
   "isRequired": false,
   "description": "Specifies the maximum number of clusters in a Dirichlet process model.",
   "datatype": "INTEGER"
   "permittedValues": [
   "defaultValue": "DIAGONAL",
   "name": "CovarianceType",
   "isRequired": false,
   "description": "Specifies the type of the covariance matrices.",
   "datatype": "STRING"
   "defaultValue": 0.001,
   "name": "Tolerance",
   "isRequired": false,
   "description": "Specifies the minimum change in log-likelihood between iterations that causes the function to terminate.",
   "datatype": "DOUBLE"
   "defaultValue": false,
   "name": "PackOutput",
   "isRequired": false,
   "description": "Specifies whether the function packs the output. The default value is 'false'.",
   "datatype": "BOOLEAN"