Setting HDP 2.3.4 YARN Configuration Changes

If you have previously installed and then uninstalled an Aster instance and the Hadoop cluster configuration has not changed, you do not need to repeat these YARN configuration changes, but you may want to confirm that the required parameters are still present and in effect.

Access the ambari-server.
Determine if the YARN configuration settings meet the memory requirements for running an Aster instance on the Hadoop cluster:
1. Locate the memory allocated for all YARN containers on a node. In the Ambari web interface, select Yarn > Configs > Settings. In the Memory section under the Node heading, note the memory allocated for all YARN containers on a node value as value A, which is used later in this procedure for computation purposes. In the example below, the value of A is 12GB.
2. Determine the max memory required for map-reduce jobs. In the Ambari web interface, select MapReduce2 > Configs > Settings. In the MapReduce section under the MapReduce Framework heading, determine the max(MapMemory, ReduceMemory). Note the max(MapMemory, ReduceMemory) value as value B, which is used later in this procedure for computation purposes. In the example below, the value of B is 5GB.
3. Determine the number of vWorkers you want to configure per Worker node. Note the number of vWorkers value as value C, which is used later in this procedure for computation purposes. As an example, the value of C is 2. This number is also used later to calculate the NUM_PARTITIONS parameter when you configure the /home/beehive/.cluster_config file.
4. Determine the memory to be configured per vWorker. Note the memory to be configured per vWorker value as value D which is used later in this procedure for computation purposes. As an example, the value of D is 2GB. Teradata Aster recommends 32768 (32GB) per vWorker. If you can not allocate 32GB per vWorker due to memory limitations, try a smaller value.
  This value is also used later as the containerMemoryInMB parameter when you define values in the/home/beehive/config/asteryarn.cfg file.
5. The Aster YARN application Master runs in a container on one of the data nodes and needs memory allocated. We request 512MB of memory for the Aster YARN application Master, but what gets allocated is max(512MB, X), where X is the value of the Minimum Container Size (Memory). To locate the value for X, access Yarn > Configs > Settings in the Memory section under the Container heading. Note the max(512MB, X) value as value E, which is used later in this procedure for computation purposes. In the example below, X is 512MB, so the value of E (max(512MB, X)) is 512MB.
6. Using the values for A, B, C, D, and E in the below memory requirements equation, determine if the values satisfy the equation, so that memory requirements are met for running an Aster instance on the Hadoop cluster. A >= (B + C*D + E)
  For example: 12000>=(5000+2000*2+512)
If memory requirements are not met for running an Aster instance on the Hadoop cluster, perform one or more of the following actions in order to satisfy the equation:
- Reduce the value for Minimum Container Size (Memory) at Yarn > Configs > Settings in the Memory section under the Container heading.
  When determining the “Minimum container size”, consider if you will run other jobs in addition to Aster YARN and Hive (Map-reduce).
- Increase the value for Memory allocated for all YARN containers on a node, which is the A value, at Yarn > Configs > Settings in the Memory section under the Node heading.
- Reduce the memory to be configured per vWorker, which is the D value.
Confirm the YARN configuration settings meet the CPU requirements for running an Aster instance on the Hadoop cluster:
1. Locate the number of virtual cores. In the Ambari web interface, select Yarn > Configs > Settings. In the CPU section under the Node heading, note the number of virtual cores as value A, which is used later in this procedure for computation purposes.
2. The Minimum container size (virtual cores) is 1.
3. Locate the maximum number of virtual cores. In the Ambari web interface, select Yarn > Configs > Settings. In the CPU section under the Container heading, note the Maximum Container Size (VCores) as value B, which is used later in this procedure for computation purposes.
4. Determine the number of vWorkers you want to configure per Worker node. Note the number of vWorkers value as value C, which is used later in this procedure for computation purposes. This number is also used later to calculate the NUM_PARTITIONS parameter when you configure the /home/beehive/.cluster_config file.
5. Determine number of virtual cores you want to configure per vWorker. Note the number of virtual cores value as value D, which is used later in this procedure for computation purposes. Teradata recommends four virtual cores per vWorker and that one vWorker configured with four virtual cores can manage 500GB of data and ten concurrent users on a supported Hadoop cluster. If you cannot allocate four virtual cores per vWorker due to limitations, allocate a fewer number of virtual cores per vWorker.
  The number of virtual cores is also used later as the containerVCore parameter when you configure the /home/beehive/config/asteryarn.cfg file.
6. Using the values for A, B, C and D, in the below CPU requirements equations, verify that the values evaluate correctly in the equations. A>= C*D
  B >= C*D
  
  Note that (B - C*D) cores are used for the remaining applications. If Hive (Map-reduce) becomes hung, try decreasing C and D.
Restart all YARN services.

Setting HDP 2.3.4 YARN Configuration Changes - Aster Execution Engine

Aster Instance Installation Guide for Aster-on-Hadoop Only