Setting CDH 5.5.1 YARN Configuration Changes

If you have previously installed and then uninstalled an Aster instance and the Hadoop cluster configuration has not changed, you do not need to repeat these YARN configuration changes.

Access the Cloudera Manager.
Determine if the YARN configuration settings meet the memory requirements for running an Aster instance on the Hadoop cluster:
1. In the Cloudera Manager interface, select Clusters > YARN > Configuration.
2. Locate the memory allocated for all YARN containers on a node by using the Search option in the Configuration pane to search for “Container memory”. Note the “Container memory” value as value A, which is used later in this procedure for computation purposes.
3. Determine the max memory required for map-reduce jobs by using the Search option in the Configuration pane to search for “Map Task memory” and “Reduce Task memory”. Note the max(Map Task memory, Reduce Task memory) value as value B, which is used later in this procedure for computation purposes.
4. Determine the number of vWorkers you want to configure per Worker node. Note the number of vWorkers value as value C, which is used later in this procedure for computation purposes. This number is also used later to calculate the NUM_PARTITIONS parameter when you configure the /home/beehive/.cluster_config file.
5. Determine the memory to be configured per vWorker. Note the memory to be configured per vWorker value as value D which is used later in this procedure for computation purposes. Teradata recommends 32768 (32GB) per vWorker. If you can not allocate 32GB per vWorker due to memory limitations, try a smaller value.
  This value is also used later as the containerMemoryInMB parameter when you define values in the/home/beehive/config/asteryarn.cfg file.
6. Locate the minimum container memory allocation by using the Search option in the Configuration pane to search for “Container Memory Minimum”. Note the max(512MB, X) value, where X is the value of the Container Memory Minimum, as value E, which is used later in this procedure for computation purposes.
  The Aster YARN application Master runs in a container on one of the data nodes and needs memory allocated. 512MB of memory is requested for the Aster YARN application Master, but what is allocated is max(512MB, X).
7. Using the values for A, B, C, D, and E in the memory requirements equation, determine if the values satisfy this equation: A >= (B + C*D + E)
If memory requirements are not met for running an Aster instance on the Hadoop cluster, perform one or more of the following actions in order to satisfy the equation:
- Reduce the value for "Container Memory Minimum".
  When determining the “Container Memory Minimum”, consider if you will run other jobs in addition to Aster YARN and Hive (Map-reduce).
- Increase the value for Memory allocated for all YARN containers on a node, which is the A value, at Clusters > YARN > Configuration and use the Search option to search for “Container memory”.
- Reduce the memory to be configured per vWorker, which is the D value.
Confirm the YARN configuration settings meet the CPU requirements for running an Aster instance on the Hadoop cluster:
1. In the Cloudera Manager interface, select Clusters > YARN > Configuration.
2. Locate the number of virtual cores by using the Search option in the Configuration pane to search for “Container Virtual CPU Cores”. Note the number of virtual cores as value A, which is used later in this procedure for computation purposes.
3. The Minimum container size (virtual cores) is 1.
4. Locate the maximum number of virtual cores by using the Search option in the Configuration pane to search for “Container Virtual CPU Cores Maximum”. Note the maximum number of virtual cores as value B, which is used later in this procedure for computation purposes.
5. Determine the number of vWorkers you want to configure per Worker node. Note the number of vWorkers value as value C, which is used later in this procedure for computation purposes. This number is also used later to calculate the NUM_PARTITIONS parameter when you configure the /home/beehive/.cluster_config file.
6. Determine number of virtual cores you want to configure per vWorker. Note the number of virtual cores value as value D, which is used later in this procedure for computation purposes. Teradata recommends four virtual cores per vWorker and that one vWorker configured with four virtual cores can manage 500GB of data and ten concurrent users on a supported Hadoop cluster. If you cannot allocate four virtual cores per vWorker due to limitations, allocate a fewer number of virtual cores per vWorker.
  The number of virtual cores is also used later as the containerVCore parameter when you configure the /home/beehive/config/asteryarn.cfg file.
7. Using the values for A, B, C and D, in the below CPU requirements equations, verify that the values evaluate correctly in the equations. A>= C*D
  B >= C*D
  
  Note that (B - C*D) cores are used for the remaining applications. If Hive (Map-reduce) becomes hung, try decreasing C and D.
Restart YARN and any dependent services as suggested by Cloudera Manager.

Setting CDH 5.5.1 YARN Configuration Changes - Aster Execution Engine

Aster Instance Installation Guide for Aster-on-Hadoop Only