Generating Scripts for a Single Hadoop Spark Instance - Aster Analytics

Teradata AsterĀ® Spark Connector User Guide

Product
Aster Analytics
Release Number
7.00.00.01
Published
May 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
dbt1482959363906.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
dbt1482959363906
lifecycle
previous
Product Category
Software
Using the script configureAsterSpark, this procedure generates two scripts for a single Hadoop Spark instance, one to be executed on the Hadoop name node and one to be executed on the Teradata Aster queen node.
  1. On the Aster queen, as user beehive, generate the configuration scripts:
    configureAsterSpark
    
    The script displays:
    configureAsterSpark configures the static spark.config entries of
    an Aster/Spark instance. It assumes:
      1) Aster Spark Connector is installed on Aster cluster
      2) Spark is installed on Hadoop cluster
      3) a Hadoop user has been created & authorized to submit Spark jobs
    
    configureAsterSpark generates 2 scripts:
     AsterSpark_queen.sh    # copies files from Hadoop to vworkers
                            # writes static spark.config entries
     AsterSpark_hadoop.sh   # creates 2 hdfs directories, copies
                            # aster-spark-extensions jar from Aster to Hadoop
    and if Aster queen has open ssh access to Hadoop, it also
    displays Hadoop's security authentication method,
    verifies Aster vworkers can resolve host names of Hadoop nodes,
    
    Spark namenode [queen_host_name]:
  2. In response to the prompt, enter the fully qualified host name of the Hadoop Spark name node (hadoop_name_node_host_name). If you see the following result, accept the default response, n, and refer to Unknown Hadoop Spark host name before proceeding.
    hadoop_name_node_host_name is not reachable. Proceed anyway? [n]:
    The script displays:
    ...
    We currently have no ssh access to hadoop_node_host_name. 
    For:
    1) configureAsterSpark to verify if Aster vworkers can resolve host names of Hadoop nodes, and
    2) the copying in the generated AsterSpark_queen.sh script to work,
    beehive must first be granted passwordless open ssh access to hadoop_node_host_name.
    
    Do ssh-copy-id -i /home/beehive/.ssh/id_rsa beehive@hadoop_node_host_name? [y]:
  3. Accept the default response, y. The script displays:
    Granting beehive passwordless ssh access to hadoop_node_host_name.
    To do this, we will now execute
    
       ssh-copy-id -i /home/beehive/.ssh/id_rsa beehive@hadoop_node_host_name
    ...
    beehive@hadoop_node_host_name's password:
  4. Enter the beehive password. The script displays:
    beehive@hadoop_node_host_name's password:
    Now try logging into the machine, with "ssh 'beehive@hadoop_node_host_name'", and check in:
    
       .ssh/authorized_keys
    
    to make sure we haven't added extra keys that you weren't expecting.
    
    user to do spark-submit [beehive]:
  5. Enter the user ID of sparkJobSubmitter, the user that you created in steps 5 and 6 of Preparing for Installation. If Kerberos authentication is enabled on the Hadoop system, you must enter the fully qualified Kerberos principal. For example:
    sparkJobSubmitter/hpd101.labs.teradata.com@HDP101.HADOOP.TERADATA.COM
    The script displays:
    PERSIST_LOCATION [/user/sparkJobSubmitter/tmp]:
    ...
    Verifying Aster vworkers can resolve Hadoop hostnames...
    passed
    
    aster-spark-extension-spark1.6.1.jar 100% 457KB 456.5KB/s 00:00
    
    Generating scripts AsterSpark_hadoop_node_host_name_queen.sh,
    AsterSpark_hadoop_node_host_name_hadoop.sh...
    
    password of act user beehive [beehive]:

    The system checks and displays the security authentication method of hadoop_node_host_name. If the method is Kerberos, the system prompts you for the sparkJobSubmitter keytab (created in step 6 of Preparing for Installation). The sparkJobSubmitter keytab is a directory path on the target Hadoop name node.

    If the system warns that the Aster vworker nodes cannot resolve the Hadoop node host names, you must verify that you can ping the Hadoop node host names from the vworkers before proceeding.

    The system copies the appropriate aster-spark-extension-spark*.jar file from the Aster queen node to the Hadoop master node /tmp directory, for later use by the system-generated AsterSpark_hadoop_name_node_host_name_hadoop.sh script.

  6. In response to the system prompt, enter the password of the Aster act user beehive.
  7. In response to the system prompt, enter the password of the Aster act db_superuser. The system displays your next two steps:
    To finish setting up static config entries of this Aster Spark instance, run:
      1) AsterSpark_hadoop_node_host_name_hadoop.sh on hadoop_node_host_name
      2) AsterSpark_hadoop_node_host_name_queen.sh on queen
  8. If you have multiple Hadoop Spark instances, proceed to Generating Scripts for Additional Hadoop Spark Instances; otherwise, proceed to Executing the Teradata Aster Spark Hadoop Installation/Configuration Script.