Using the script configureAsterSpark, this procedure generates two scripts for a single Hadoop Spark instance, one to be executed on the Hadoop name node and one to be executed on the Teradata Aster queen node.
-
On the Aster queen, as user beehive, generate the configuration scripts:
configureAsterSpark
The script displays:configureAsterSpark configures the static spark.config entries of an Aster/Spark instance. It assumes: 1) Aster Spark Connector is installed on Aster cluster 2) Spark is installed on Hadoop cluster 3) a Hadoop user has been created & authorized to submit Spark jobs configureAsterSpark generates 2 scripts: AsterSpark_queen.sh # copies files from Hadoop to vworkers # writes static spark.config entries AsterSpark_hadoop.sh # creates 2 hdfs directories, copies # aster-spark-extensions jar from Aster to Hadoop and if Aster queen has open ssh access to Hadoop, it also displays Hadoop's security authentication method, verifies Aster vworkers can resolve host names of Hadoop nodes, Spark namenode [queen_host_name]:
-
In response to the prompt, enter the fully qualified host name of the Hadoop Spark name node (hadoop_name_node_host_name).
If you see the following result, accept the default response, n, and refer to Unknown Hadoop Spark host name before proceeding.
hadoop_name_node_host_name is not reachable. Proceed anyway? [n]:
The script displays:... We currently have no ssh access to hadoop_node_host_name. For: 1) configureAsterSpark to verify if Aster vworkers can resolve host names of Hadoop nodes, and 2) the copying in the generated AsterSpark_queen.sh script to work, beehive must first be granted passwordless open ssh access to hadoop_node_host_name. Do ssh-copy-id -i /home/beehive/.ssh/id_rsa beehive@hadoop_node_host_name? [y]:
-
Accept the default response, y.
The script displays:
Granting beehive passwordless ssh access to hadoop_node_host_name. To do this, we will now execute ssh-copy-id -i /home/beehive/.ssh/id_rsa beehive@hadoop_node_host_name ... beehive@hadoop_node_host_name's password:
-
Enter the beehive password.
The script displays:
beehive@hadoop_node_host_name's password: Now try logging into the machine, with "ssh 'beehive@hadoop_node_host_name'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. user to do spark-submit [beehive]:
-
Enter the user ID of sparkJobSubmitter, the user that you created in steps 5 and 6 of Preparing for Installation.
If Kerberos authentication is enabled on the Hadoop system, you must enter the fully qualified Kerberos principal. For example:
sparkJobSubmitter/hpd101.labs.teradata.com@HDP101.HADOOP.TERADATA.COM
The script displays:PERSIST_LOCATION [/user/sparkJobSubmitter/tmp]: ... Verifying Aster vworkers can resolve Hadoop hostnames... passed aster-spark-extension-spark1.6.1.jar 100% 457KB 456.5KB/s 00:00 Generating scripts AsterSpark_hadoop_node_host_name_queen.sh, AsterSpark_hadoop_node_host_name_hadoop.sh... password of act user beehive [beehive]:
The system checks and displays the security authentication method of hadoop_node_host_name. If the method is Kerberos, the system prompts you for the sparkJobSubmitter keytab (created in step 6 of Preparing for Installation). The sparkJobSubmitter keytab is a directory path on the target Hadoop name node.
If the system warns that the Aster vworker nodes cannot resolve the Hadoop node host names, you must verify that you can ping the Hadoop node host names from the vworkers before proceeding.
The system copies the appropriate aster-spark-extension-spark*.jar file from the Aster queen node to the Hadoop master node /tmp directory, for later use by the system-generated AsterSpark_hadoop_name_node_host_name_hadoop.sh script.
- In response to the system prompt, enter the password of the Aster act user beehive.
-
In response to the system prompt, enter the password of the Aster act db_superuser.
The system displays your next two steps:
To finish setting up static config entries of this Aster Spark instance, run: 1) AsterSpark_hadoop_node_host_name_hadoop.sh on hadoop_node_host_name 2) AsterSpark_hadoop_node_host_name_queen.sh on queen
- If you have multiple Hadoop Spark instances, proceed to Generating Scripts for Additional Hadoop Spark Instances; otherwise, proceed to Executing the Teradata Aster Spark Hadoop Installation/Configuration Script.