A JSON (JavaScript Object Notation) function descriptor is a JSON file that ML Engine uses for function metadata processing.
Each of the sections in the table below is described in more details in the subsequent tables.
Major Sections of JSON Descriptor
Section | Description |
---|---|
Header | Specifies function name, version, and type information. |
Input_tables | Specifies function ON clause fields for non-driver functions. Input tables for non-driver functions are specified in this section. |
Argument_clauses | Specifies function argument fields. Input tables for driver functions and all output tables are specified by argument clauses in this section. |
Header Section Fields
Required fields are marked with an asterisk (*).
Field | Type | Description |
---|---|---|
* json_schema_major_version | string | Major version of JSON schema. Set to "1". |
* json_schema_minor_version | string | Minor version of JSON schema. Set to "2". |
* json_content_version | string | JSON content version. Set to "1". |
* function_name | string | Name of function class file. |
function_version | string | Version of function. |
* function_type | string | Specifies function type ("driver" or "non-driver"). See Compatibility Considerations for UDFs for an explanation of driver and non-driver functions. |
short_description | string | Short description of function. |
long_description | string | Long description of function. |
Input_tables Section Fields
Field | Type | Description |
---|---|---|
* name | string | Specifies ON clause alias. If no alias, use "input" as alias. |
* datatype | string | Set to "TABLE_ALIAS" for each ON clause. |
requiredInputKind | list of string | Partition information for how ON clause is specified in the syntax. It can be a combination of PartitionByKey, PartitionByAny, or Dimension. If not specified, PartitionByAny is used. Examples are: ON tablename PARTITION BY column-name ON tablename PARTITION BY ANY ON tablename DIMENSION |
partitionByOne | boolean | Specifies whether ON clause accepts PartitionByOne. For this to be true, requiredInputKind must be PartitionByKey. For example: ON tablename PARTITION BY 1 |
partitionByOneInclusive | boolean | Specifies whether ON clause accepts both PartitionByOne and PartitionByKey. For this to be True, PartitionByOne must also be true. |
isOrdered | boolean | Specifies whether ON clause requires ORDER BY clause. If False, ORDER BY is optional. |
isRequired | boolean | Specifies whether ON clause is required. |
description | string | Description of ON clause. |
Argument_clauses Section Fields
Field | Type | Description |
---|---|---|
* name | string | Specifies argument name. |
* datatype | string | Specifies data type of argument value; one of these:
|
datatype | string | |
isRequired | boolean | Specifies whether argument clause is required or optional. |
* defaultValue | Boolean, numeric, or string depending on the value of the data type. | Specifies default value of argument (value for function to use if the user omits argument). Specify only if isRequired is set to false. |
permittedValues | list of string | Specifies permitted values of argument clause. For example:"permittedValues": [ "SPHERICAL", "DIAGONAL", "FULL", "TIED" ] |
description | string | Description of argument clause. |
* isOutputTable | boolean | Specifies whether argument clause accepts database table as output. For this value to be true, data type must be set to "TABLE_NAME". |
JSON Descriptor Example: GMMFit Function
{ "json_schema_major_version": "1", "json_schema_minor_version": "2", "json_content_version": "1", "function_name": "GMMFit", "function_version": "1.2", "function_type": "driver", "short_description"; "Fits a Gaussian Mixture Model to data.", "long_description": "Clusters data using a Gaussian Mixture Model or a Dirichlet Process Gaussian Mixture Model.", "input_tables";[ { "requiredInputKind":[ "PartitionByKey" ], "isOrdered": false, "partitionByOne": true, "name": "init_params", "isRequired": true, "description": "Contains initial values for the cluster weights, means, and covariances.", "datatype": "TABLE_ALIAS" } ], "argument_clauses":[ { "isOutputTable": false, "name":"InputTable", "isRequired": true, "description": "Specifies the name of the table that contains the input data to be clustered.", "datatype": "TABLE_NAME" }, { "isOutputTable": true, "name":"OutputTable", "isRequired": true, "description": "Specifies the name of the output table to which the function outputs cluster information.", "datatype": "TABLE_NAME" }, { "defaultValue": 20, "name": "MaxClusternum", "isRequired": false, "description": "Specifies the maximum number of clusters in a Dirichlet process model.", "datatype": "INTEGER" }, { "permittedValues": [ "SPHERICAL", "DIAGNONAL", "FULL", "TIED" ], "defaultValue": "DIAGONAL", "name": "CovarianceType", "isRequired": false, "description": "Specifies the type of the covariance matrices.", "datatype": "STRING" }, { "defaultValue": 0.001, "name": "Tolerance", "isRequired": false, "description": "Specifies the minimum change in log-likelihood between iterations that causes the function to terminate.", "datatype": "DOUBLE" }, { "defaultValue": false, "name": "PackOutput", "isRequired": false, "description": "Specifies whether the function packs the output. The default value is 'false'.", "datatype": "BOOLEAN" } ] }