JSON Function Descriptor - Teradata Vantage

A JSON (JavaScript Object Notation) function descriptor is a JSON file that ML Engine uses for function metadata processing.

Each of the sections in the table below is described in more details in the subsequent tables.

Major Sections of JSON Descriptor

Section	Description
Header	Specifies function name, version, and type information.
Input_tables	Specifies function ON clause fields for non-driver functions. Input tables for non-driver functions are specified in this section.
Argument_clauses	Specifies function argument fields. Input tables for driver functions and all output tables are specified by argument clauses in this section.

Header Section Fields

Required fields are marked with an asterisk (*).

Field	Type	Description
* json_schema_major_version	string	Major version of JSON schema. Set to "1".
* json_schema_minor_version	string	Minor version of JSON schema. Set to "2".
* json_content_version	string	JSON content version. Set to "1".
* function_name	string	Name of function class file.
function_version	string	Version of function.
* function_type	string	Specifies function type ("driver" or "non-driver"). See Compatibility Considerations for UDFs for an explanation of driver and non-driver functions.
short_description	string	Short description of function.
long_description	string	Long description of function.

Input_tables Section Fields

Field	Type	Description
* name	string	Specifies ON clause alias. If no alias, use "input" as alias.
* datatype	string	Set to "TABLE_ALIAS" for each ON clause.
requiredInputKind	list of string	Partition information for how ON clause is specified in the syntax. It can be a combination of PartitionByKey, PartitionByAny, or Dimension. If not specified, PartitionByAny is used. Examples are: ON tablename PARTITION BY column-name ON tablename PARTITION BY ANY ON tablename DIMENSION
partitionByOne	boolean	Specifies whether ON clause accepts PartitionByOne. For this to be true, requiredInputKind must be PartitionByKey. For example: ON tablename PARTITION BY 1
partitionByOneInclusive	boolean	Specifies whether ON clause accepts both PartitionByOne and PartitionByKey. For this to be True, PartitionByOne must also be true.
isOrdered	boolean	Specifies whether ON clause requires ORDER BY clause. If False, ORDER BY is optional.
isRequired	boolean	Specifies whether ON clause is required.
description	string	Description of ON clause.

Argument_clauses Section Fields

Field	Type	Description
* name	string	Specifies argument name.
* datatype	string	Specifies data type of argument value; one of these: "BOOLEAN" "INTEGER" "DOUBLE" "LONG" "STRING" "TABLE_NAME" (Used for names of input or output tables for driver functions. Identify output tables by setting the isOutputTable field to True.) "COLUMN_NAMES" (Used for names of columns in input tables for driver functions.) "COLUMNS" (Used for names of columns in input tables for non-driver functions.)
datatype	string
isRequired	boolean	Specifies whether argument clause is required or optional.
* defaultValue	Boolean, numeric, or string depending on the value of the data type.	Specifies default value of argument (value for function to use if the user omits argument). Specify only if isRequired is set to false.
permittedValues	list of string	Specifies permitted values of argument clause. For example: "permittedValues": [ "SPHERICAL", "DIAGONAL", "FULL", "TIED" ]
description	string	Description of argument clause.
* isOutputTable	boolean	Specifies whether argument clause accepts database table as output. For this value to be true, data type must be set to "TABLE_NAME".

JSON Descriptor Example: GMMFit Function

{
 "json_schema_major_version": "1",
 "json_schema_minor_version": "2",
 "json_content_version": "1",
 "function_name": "GMMFit",
 "function_version": "1.2",
 "function_type": "driver",
 "short_description"; "Fits a Gaussian Mixture Model to data.",
 "long_description": "Clusters data using a Gaussian Mixture Model or a Dirichlet Process Gaussian Mixture Model.",
 "input_tables";[
  {
   "requiredInputKind":[
    "PartitionByKey"
   ],
   "isOrdered": false,
   "partitionByOne": true,
   "name": "init_params",
   "isRequired": true,
   "description": "Contains initial values for the cluster weights, means, and covariances.",
   "datatype": "TABLE_ALIAS"
  }
 ],
 "argument_clauses":[
  {
   "isOutputTable": false,
   "name":"InputTable",
   "isRequired": true,
   "description": "Specifies the name of the table that contains the input data to be clustered.",
   "datatype": "TABLE_NAME"
  },
  {
   "isOutputTable": true,
   "name":"OutputTable",
   "isRequired": true,
   "description": "Specifies the name of the output table to which the function outputs cluster information.",
   "datatype": "TABLE_NAME"
  },
  {
   "defaultValue": 20,
   "name": "MaxClusternum",
   "isRequired": false,
   "description": "Specifies the maximum number of clusters in a Dirichlet process model.",
   "datatype": "INTEGER"
  },
  {
   "permittedValues": [
    "SPHERICAL",
    "DIAGNONAL",
    "FULL",
    "TIED"
   ],
   "defaultValue": "DIAGONAL",
   "name": "CovarianceType",
   "isRequired": false,
   "description": "Specifies the type of the covariance matrices.",
   "datatype": "STRING"
  },
  {
   "defaultValue": 0.001,
   "name": "Tolerance",
   "isRequired": false,
   "description": "Specifies the minimum change in log-likelihood between iterations that causes the function to terminate.",
   "datatype": "DOUBLE"
  },
  {
   "defaultValue": false,
   "name": "PackOutput",
   "isRequired": false,
   "description": "Specifies whether the function packs the output. The default value is 'false'.",
   "datatype": "BOOLEAN"
  }
 ]
}