Aster Spark Connector Package - Aster Analytics

Teradata Aster® Spark Connector User Guide

Product
Aster Analytics
Release Number
7.00.00.01
Published
May 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
dbt1482959363906.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
dbt1482959363906
lifecycle
previous
Product Category
Software

The Aster Spark Connector is distributed as part of the Aster Analytics 7.00 package, as the release-independent package aster-connector-spark-07.00.00.00*.rpm. The asterisk (*) represents a string that depends on the build and platform (for example, r62820-x86_64).

The package contains these executable and configuration files:
  • Aster Spark Connector extension jar files, one for each supported version of Spark:
    • This file contains executable code for version 1.3:

      /home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.3.1.jar

    • This file contains executable code for functions specific to version 1.4:

      /home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.4.1.jar

    • This file contains executable code for functions specific to version 1.5:

      /home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.5.2.jar

    • This file contains executable code for functions specific to version 1.6:

      /home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.6.1.jar

    • This file contains executable code for functions specific to Cloudera version 1.6:

      /home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.0.CDH.jar

    Each jar file is specific to a Spark version. The jar file for version 1.3 contains classes only for version 1.3. The jar file for each successive version contains classes for all earlier versions and those introduced in the version itself.

  • RunOnSpark function for invoking the Spark functions in the Aster Spark Connector extension jar files:

    /home/beehive/bin/lib/sqlmr/functions/RunOnSpark.zip

  • Template with connection configuration entries to be instantiated into spark.config:

    /home/beehive/config/spark_shipped.config

  • ncli Spark plugins, which are Python scripts that generate spark.config:

    /home/beehive/ncli/plugins/spark.py

    /home/beehive/ncli/plugins/generateSparkConf.py

  • Configuration script generator:

    /home/beehive/bin/utils/primitives/configureAsterSpark

  • Test sets for validating the Aster Spark connector, each of which consists of input SQL scripts and expected results.
    Test Set Description Functions
    Analytics Alternating least squares com.teradata.aster.functions.ALSTrain

    com.teradata.aster.functions.ALSRun

    com.teradata.aster.functions.Correlations

    Analytics Clustering com.teradata.aster.functions.KmeansTrain

    com.teradata.aster.functions.KmeansRun

    com.teradata.aster.functions.Kmeans2

    Analytics Linear methods (machine learning) com.teradata.aster.functions.LinearRegrWithSGDTrain

    com.teradata.aster.functions.LinearRegrWithSGDRun

    Analytics Multilayer perception classifier com.teradata.aster.functions.MLPCTrainDF

    com.teradata.aster.functions.MLPCRunDF

    Analytics Principal component analysis com.teradata.aster.functions.Pca
    configuration spark.config com.teradata.aster.functions.Echo
    data_transfer Data transfer com.teradata.aster.functions.examples.Count

    com.teradata.aster.functions.examples.EchoReverse

    com.teradata.aster.functions.examples.Echo

    com.teradata.aster.functions.Kmeans

    com.teradata.aster.functions.LinearRegrWithSGDTrain

    com.teradata.aster.functions.LinearRegrWithSGDRun

    empty_partitions Boundary test com.teradata.aster.functions.examples.Status

    com.teradata.aster.functions.examples.Echo

    input_output Input and output of columns of various types com.teradata.aster.functions.examples.Echo
    monitoring Get status of RunOnSpark queries com.teradata.aster.functions.examples.Status
    multiple_sqlmrs SQL operations of RunOnSpark queries com.teradata.aster.functions.examples.Echo

    com.teradata.aster.functions.examples.EchoReverse

To allow you to submit Spark jobs remotely without using the remote ssh solution, the installation script performs these steps:
  1. Installs the Spark client on the Aster master vworker node.

    The Spark client consists of a set of scripts and Spark assembly jar. This Spark client must be compatible with the Spark client on the Hadoop/Spark cluster.

  2. Installs the remote Spark configuration files on the Aster master vworker node.
  3. Installs the Hadoop jar files on the Aster queen node and each vworker node.
  4. Installs the Hadoop configuration files on the Aster queen node and each vworker node.

The installation script announces and explains each action before executing it. In system output, an ellipsis (…) indicates that the script is showing only the beginning of the output, for brevity.