Preparing for Installation - Aster Analytics

Ensure that these are running:

Aster Database version AD 6.20 or later, using /home/beehive/toolchain/x86_64-unknown-linux-gnu/python-2.7.3/bin/python
Hadoop/Spark cluster version HDP 2.4.2 or CDH 5.5.2
Spark version 1.6.1 (for HDP 2.4.2) or 1.5 (for CDH 5.5.2)

Ensure that these Hadoop YARN container settings have these values:

Setting	Value
Memory allocated for all YARN containers on a node	Maximum available value
Minimum container size	512 MB
Maximum container size	2048 MB
Percentage of physical CPU allocated for all containers on a node	Maximum available value
Number of virtual cores	Maximum available value
Maximum container size	Maximum available value

Ensure that the Hadoop nodes have enough free disk space. To ensure that Hadoop cleans up cache and log files before running out of disk space, configure these settings to values appropriate to your cluster:

yarn.nodemanager.localizer.cache.cleanup.interval-ms
yarn.nodemanager.localizer.cache.target-size-mb

Verify that Aster Database and Hadoop have network connectivity:

Verify that the Aster Database queen and vworker nodes can resolve the host names of the Hadoop/Spark node and vworkers:
From the queen and each vworker, at the command prompt, enter:
```
ping -c 3 -w 10 hadoop_node_host_name
```
If you see the following result, refer to Unknown Hadoop Spark host name.
```
ping: unknown host hadoop_node_host_name
```
Verify that the setuid bit of /bin/ping is set (so that users can run ping).

Create a cluster-wide user on the Hadoop/Spark cluster (hereafter called sparkJobSubmitter) and authorize this user to submit Spark jobs.

If Kerberos authentication is enabled on the Hadoop system:

Create a Kerberos sparkJobSubmitter principal and keytab credentials for sparkJobSubmitter. For example, if sparkJobSubmitter is sparkJobSubmitter:

# as root on Hadoop master node, create sparkJobSubmitter principal:
kadmin.local
addprinc sparkJobSubmitter/hdp101m1.labs.teradata.com@HDP101.HADOOP.TERADATA.COM
# supply sparkJobSubmitter's password
# create sparkJobSubmitter's keytab
ktadd -k /home/sparkJobSubmitter/sparkJobSubmitter.keytab sparkJobSubmitter/hdp101m1.labs.teradata.com@HDP101.HADOOP.TERA
quit
# give sparkJobSubmitter ownership of its keytab
chown -R sparkJobSubmitter /home/sparkJobSubmitter/sparkJobSubmitter.keytab

Copy /etc/krb5.conf to the Aster cluster. For example:

scp /etc/krb5.conf root@Aster_queen:/etc/krb5.conf.new
# As root on the Aster queen node, clone this kerberos conf file
cp /etc/krb5.conf /etc/krb5.conf.old
cp /etc/krb5.conf.new /etc/krb5.conf
ncli node clonefile /etc/krb5.conf

If sparkJobSubmitter will use ssh to submit Spark jobs, enable either passwordless-ssh or identity-file-based ssh from the Teradata Aster vworkers nodes to the Hadoop/Spark cluster.

Create a Hadoop distributed file system (HDFS) directory for this user (for example, /user/sparkJobSubmitter ). One of the system-generated configuration scripts assumes that this user can create HDFS directories under this directory and copy files to them. For example, the user must be able to execute commands such as:

hadoop fs -mkdir -p /user/sparkJobSubmitter
hadoop fs -chown -R sparkJobSubmitter /user/sparkJobSubmitter

Create a user named beehive on the Hadoop/Spark cluster.

Grant beehive read access to the Hadoop Spark assembly jar and topology_mappings.data files and write access to the /tmp directory (to which the installation script copies the aster-spark-extension*.jar file). The locations of the Hadoop Spark assembly jar and topology_mappings.data files can be system-specific. The configureAsterSpark script expects to access these files at these locations:

Hadoop Distribution	Spark Assembly jar Location	Topology Mappings Location
HDP	/usr/hdp/version/spark/lib/	/etc/hadoop/conf/
CDH	/var/.../spark/lib/ or /opt/.../spark/lib/	/etc/hadoop/conf/

If Aster and Spark are on different clusters, ensure that the Aster Database queen and vworker nodes can resolve the host names of your Hadoop node and vworkers. On most platforms, you do this using a form of Domain Name Server. On other platforms, one way to do this is:

On the Aster Database queen node, edit /etc/hosts, adding the IP addresses and host names of your Hadoop nodes and your Aster queen and vworker nodes.
Copy the edited file to your Hadoop nodes, using this command:
```
ncli node clonefile /etc/hosts
```

On the Aster database queen and vworker nodes, grant beehive the privilege to execute this command:

/bin/chown extensibility\:extensibility/home/beehive/config/spark/*/IDENTITYFILE

The procedure for granting this privilege depends on your platform and environment. One example is:

On the Aster queen, as user root, enter:
```
visudo -f /etc/sudoers
```
Add this line:
```
beehive ALL= NOPASSWD: /bin/chown extensibility\:extensibility/home/beehive/config/spark/*/IDENTITYFILE
```
This line lets the user beehive execute the chown command from any terminal on the queen node without specifying a password.
Search for 'Defaults requiretty' line and comment out that line:
```
# Defaults    requiretty
```
Save /etc/hosts and exit visudo.
Let vworkers transfer ownership of the id_rsa identity file from the user beehive to the user extensibility, using this command:
```
ncli node clonefile /etc/sudoers
```
This ability is important because:
- To access Spark, the Aster Database uses RunOnSpark queries, which use vworkers to submit Spark jobs. The vworkers submit jobs with the user ID of the user created in steps 5 and 6.
- The vworkers can submit Spark jobs with OpenSSH. On the vworker, the user extensibility executes RunOnSpark tasks, which can include submitting Spark jobs. For security, the ownership and access to any identity file must be transferred and limited to the user extensibility.
The user beehive needs the ability to execute the chown command only during Aster Spark Connector configuration. After configuration, you can revoke this privilege by logging onto the queen node as the root user, entering visudo -f /etc/sudoers, deleting the line that you added in step b, adding the line that you deleted in step c, saving the file, exiting visudo, and cloning the modified file to vworkers.