Building the UserEcho Function - Aster Analytics

Teradata Aster® Spark Connector User Guide

Product
Aster Analytics
Release Number
7.00.00.01
Published
May 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
dbt1482959363906.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
dbt1482959363906
lifecycle
previous
Product Category
Software
  1. Save the Scala code for the UserEcho function to the file UserEcho.scala.
  2. Create the directory structure:
    mkdir -p ~/"directory"/src/main/scala/"packageName"
    mkdir -p ~/"directory"/lib
    
  3. In the directory ~/"directory", create the file build.sbt, with this content:
    name := "UserEcho"
    version := "1.0"
    scalaVersion := "scala_version"
    libraryDependencies += "org.apache.spark" %% "spark-core" % "spark_version"
  4. Find the dependency files, aster-spark-extension-sparkspark_version.jar and spark-assembly-spark_version hadoop_version.jar. Using the find command:
    • If Aster and Hadoop are on different clusters, find aster-spark-extension-sparkspark_version.jar on the Aster cluster and spark-assembly-spark_version hadoop_version.jar on the Hadoop/Spark cluster.
    • If Aster and Hadoop are on the same cluster, find both files on that cluster.

    If it is unclear which version of aster-spark-extension-sparkspark_version.jar to use:

    1. On the Hadoop/Spark cluster, issue this command:
      hdfs dfs -ls /user/"sparkJobSubmitter"/"sparkJobSubmitter"
      The command returns a version number.
    2. On the Aster cluster, find the aster-spark-extension-sparkspark_version.jar with the same version number.
  5. Copy the dependency files to the directory ~/"directory"/lib.
  6. Copy UserEcho.scala to the directory ~/"directory"/src/main/scala/"packageName ".
  7. Ensure that the directory structure is:
    directory
         build.sbt
         lib
              aster-spark-extension-sparkspark_version.jar
              spark-assembly-1.6.0-hadoophadoop_version.jar
         src
              main
                   scale
                        packageName
                             UserEcho.scala
  8. Build the function:
    % cd directory
    % sbt package
    
    The sbt command creates the file ~/"directory"/target/scala-2.10/ userecho_2.10-1.0.jar .
  9. Copy the UserEcho function to the hdfs directory of sparkJobSubmitter:
    cp ~/”directory”/target/scala-2.10/userecho_2.10-1.0.jar to /tmp
    su – hdfs
    hdfs dfs -put /tmp/userecho_2.10-1.0.jar /user/"directory"/"directory"
    hdfs dfs -chown "directory":"directory" /user/"directory"/"directory"/userecho_2.10-1.0.jar
    
  10. Run the UserEcho function:
    SELECT * FROM RunOnSpark (
      ON people
      SPARKCODE ('myfunctions.UserEcho')
      OUTPUTS ('name varchar(250)', 'age integer')
      SPARK_CLUSTER_ID ('Aster_Spark_*_site.json-socket-SSh')
      APP_RESOURCE ('hdfs:/user/"sparkJobSubmitter"/"sparkJobSubmitter"/userecho_2.10-1.0.jar')
      JARS (‘hdfs:/user/"sparkJobSubmitter"/"sparkJobSubmitter"/aster-spark-extension-sparkspark_version.jar')
    ) sp;
    

    The RunOnSpark function requires the information provided by the APP_RESOURCE and JARS arguments; however, you can omit these arguments from the query if you provide them in the configuraton file.