The Aster Spark Connector is distributed as part of the Aster Analytics 7.00 package, as the release-independent package aster-connector-spark-07.00.00.00*.rpm. The asterisk (*) represents a string that depends on the build and platform (for example, r62820-x86_64).
-
Aster Spark Connector extension jar files, one for each supported version of Spark:
- This file contains executable code for version 1.3:
/home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.3.1.jar
- This file contains executable code for functions specific to version 1.4:
/home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.4.1.jar
- This file contains executable code for functions specific to version 1.5:
/home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.5.2.jar
- This file contains executable code for functions specific to version 1.6:
/home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.6.1.jar
- This file contains executable code for functions specific to Cloudera version 1.6:
/home/beehive/bin/lib/sqlmr/functions/aster-spark-extension-spark1.0.CDH.jar
Each jar file is specific to a Spark version. The jar file for version 1.3 contains classes only for version 1.3. The jar file for each successive version contains classes for all earlier versions and those introduced in the version itself.
- This file contains executable code for version 1.3:
- RunOnSpark function for invoking the Spark functions in the Aster Spark Connector extension jar files:
/home/beehive/bin/lib/sqlmr/functions/RunOnSpark.zip
- Template with connection configuration entries to be instantiated into spark.config:
/home/beehive/config/spark_shipped.config
- ncli Spark plugins, which are Python scripts that generate spark.config:
/home/beehive/ncli/plugins/spark.py
/home/beehive/ncli/plugins/generateSparkConf.py
- Configuration script generator:
/home/beehive/bin/utils/primitives/configureAsterSpark
- Test sets for validating the Aster Spark connector, each of which consists of input SQL scripts and expected results.
Test Set Description Functions Analytics Alternating least squares com.teradata.aster.functions.ALSTrain com.teradata.aster.functions.ALSRun
com.teradata.aster.functions.Correlations
Analytics Clustering com.teradata.aster.functions.KmeansTrain com.teradata.aster.functions.KmeansRun
com.teradata.aster.functions.Kmeans2
Analytics Linear methods (machine learning) com.teradata.aster.functions.LinearRegrWithSGDTrain com.teradata.aster.functions.LinearRegrWithSGDRun
Analytics Multilayer perception classifier com.teradata.aster.functions.MLPCTrainDF com.teradata.aster.functions.MLPCRunDF
Analytics Principal component analysis com.teradata.aster.functions.Pca configuration spark.config com.teradata.aster.functions.Echo data_transfer Data transfer com.teradata.aster.functions.examples.Count com.teradata.aster.functions.examples.EchoReverse
com.teradata.aster.functions.examples.Echo
com.teradata.aster.functions.Kmeans
com.teradata.aster.functions.LinearRegrWithSGDTrain
com.teradata.aster.functions.LinearRegrWithSGDRun
empty_partitions Boundary test com.teradata.aster.functions.examples.Status com.teradata.aster.functions.examples.Echo
input_output Input and output of columns of various types com.teradata.aster.functions.examples.Echo monitoring Get status of RunOnSpark queries com.teradata.aster.functions.examples.Status multiple_sqlmrs SQL operations of RunOnSpark queries com.teradata.aster.functions.examples.Echo com.teradata.aster.functions.examples.EchoReverse
- Installs the Spark client on the Aster master vworker node.
The Spark client consists of a set of scripts and Spark assembly jar. This Spark client must be compatible with the Spark client on the Hadoop/Spark cluster.
- Installs the remote Spark configuration files on the Aster master vworker node.
- Installs the Hadoop jar files on the Aster queen node and each vworker node.
- Installs the Hadoop configuration files on the Aster queen node and each vworker node.
The installation script announces and explains each action before executing it. In system output, an ellipsis (…) indicates that the script is showing only the beginning of the output, for brevity.