- SPARKCODE
- Specifies the name of a function and its arguments. The function must be in the jar file specified by the APP_RESOURCE argument.
- OUTPUTS
- Specifies the names and data types of the output columns.
- MEM_LIMIT_MB
- Specifies the maximum number of megabytes to allocate for the data transfer buffers.
- TIMEOUT_SEC
- Specifies the time, in seconds, after which the query times out (is canceled). If timeout_value is 0, timeout handling is disabled.
- STATUS_INTERVAL_SEC
- Specifies the time interval, in seconds, after which to check the Spark job status. If status_interval_value is 0, status checking is disabled.
- APP_RESOURCE
- Specifies the location of the Teradata Aster Spark jar file on the Hadoop/Spark cluster. The aster_jar_location can be either an HDFS location, shared Linux drive, or local Linux drive. If you wrote your own functions and put them into your own jar file, aster_jar_location must be the location of your jar file.
- JARS
- Specifies additional jar files. If you wrote your own functions that invoke the , specify the location of the Teradata-supplied aster-spark-extension-sparkversion.jar file.
- EXTRA_SPARK_SUBMIT_OPTIONS
- Specifies extra options to include when submitting the Spark job. The syntax of option_value_pair is:
option value
- SPARK_CLUSTER_ID
- Specifies the name of the Hadoop/Spark cluster in the configuration file to be used to execute the spark function that SPARKCODE specifies. A Spark instance can have multiple entries, which have different parameters in the configuration file. A query can have multiple RunOnSpark functions, which reference either the same or different Hadoop/Spark clusters.
- DATA_TRANSFER
- Specifies the method for transferring data to and from Spark:
- 'file':
Transfer the data to a distributed set of files (for example, on HDFS). The Spark application reads data from these files and writes output to a set of files.
- 'socket-persist':
Transfer the data directly to and from the Spark application through sockets and persist the sent data to files on the Spark side.
- 'file':
- PERSIST_LOCATION
- Specifies HDFS location to use for results received from Spark.
- SPARK_PROPERTIES
- Specifies additional Spark properties to apply when running the Spark job. An example of spark_property_name is spark.executor.memory, which specifies the amount of memory to use for each executor process.
- REST_URL
- Specifies the Spark REST URL that the RunOnSpark master instances uses to submit jobs, query their status, and cancel them if they run beyond their timeout limit.
- SSH_HOST
- Specifies the name of the user with whose credentials the Spark job runs and the host where the Spark job starts. The user extensibility must have OpenSSH access to user@host to run the commands SPARK-SUBMIT and YARN.
- IDENTITY_FILE
- Specifies the identity file (.pem) path to use with remote ssh when you do not want to enable passwordless ssh to the Hadoop/Spark cluster.
- SPARK_SUBMIT_COMMAND
- Specifies the command that starts the Spark job.
- YARN_COMMAND
- Specifies the yarn command to invoke.
- HADOOP_JARS_LOCATIONS
- Specifies paths to the Hadoop jar files, used when DATA_TRANSFER is 'file'.
- HADOOP_CONF_LOCATION
- Specifies the local path to the Hadoop configuration file, used when DATA_TRANSFER is 'file'.
- SPARK_CONF_LOCATION
- Specifies the local path to the Spark configuration file, used when USE_REMOTE_SSH is 'false'.
- KERBEROS_AUTHENTICATION
- Specifies whether to enable Kerberos authentication on the Hadoop system.
- SPARK_JOB_USER_KEY_TAB
- Specifies the key tab file location from which the user runs the Spark job when Kerberos authentication is enabled on the Hadoop system.
- FILE_ACCESS_KEY_TAB
- Specifies the key tab file location for the accessing HDFS.
- KINIT
- Specifies the Kerberos kinit command when Kerberos authentication is enabled on the Hadoop system.
- WORKERS_IP_ADDRESSES
-
Specifies the IP addresses of the vworkers. The ip_address_start is the string with which all vworker IP addresses start. You must specify this argument if the Aster nodes have multiple IP addresses that are neither public nor accessible from the Hadoop nodes.
- LOGGING_LEVEL
- Specifies the amount of logging information generated (for Spark) and logged to the
Teradata Aster SQL-MapReduceĀ®
logs (for the RunOnSpark function):
- 'INFO': Log only information.
- 'WARNING': Log information and warnings.
- 'ERROR': Log information, warnings, and errors.
- 'DEBUG': Log as much information as possible, for help troubleshooting.
- DELIMITER
- Specifies the character to use as a field separator when transferring data to and from Spark.
- NULL_STRING
- Specifies the string to use to represent a null value.
- USE_REMOTE_SSH
- Specifies whether remote ssh is required to start the Spark job on the remote Hadoop/Spark cluster.
- SPARK_JOB_USER
- Specifies the name of the user with whose credentials the Spark job runs.
- MAILMANSERVER_BLOCK_TIMEOUT_SEC
- Specifies the number of seconds after which the Mailman server block times out.