CDH
| Metric | Description |
|---|---|
| Applications Failed | Number of YARN applications that failed to execute successfully |
| Applications Running | Number of YARN applications currently executing |
| Blocks Corrupt | Amount of corrupt blocks in HDFS |
| Blocks Missing | Amount of missing blocks in HDFS |
| Cluster Memory Allocated | Percent of the available memory allocated across all NodeManager instances |
| CPU | Average node CPU use |
| Max Disk by Node | Largest percentage of used disk space on a node |
| Name Node CPU | Average node CPU use for nodes running NameNode services |
| Name Node Heap | Percentage of heap space used in the NameNode JVM |
| Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
| Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
| ResourceManager Heap | Percentage of heap space used in the ResourceManager JVM |
| RPC Latency - RM | Average wait time in queue for ResourceManager service calls |
| RPC Latency - NN | Average wait time in queue for NameNode service calls |
| Services Bad | Number of services in a critical state |
| Services Concerning | Number of services in a degraded state |
| Total Space | Percentage of used space to overall storage capacity |
HDP 2.1 and later
| Metric | Description |
|---|---|
| Applications Failed | Number of YARN applications that failed to execute successfully |
| Applications Running | Number of YARN applications currently executing |
| Blocks Corrupt | Amount of corrupt blocks in HDFS |
| Blocks Missing | Amount of missing blocks in HDFS |
| Cluster Memory Allocated | Percent of the available memory allocated across all NodeManager instances |
| Components Down | Number of services not started |
| CPU | Average node CPU use |
| Max Disk by Node | Largest percentage of used disk space on a node |
| Name Node CPU | Average node CPU use for nodes running NameNode services |
| Name Node Heap | Percentage of heap space used in the NameNode JVM |
| Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
| Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
| ResourceManager Heap | Percentage of heap space used in the ResourceManager JVM |
| RPC Latency - RM | Average wait time in queue for ResourceManager service calls |
| RPC Latency - NN | Average wait time in queue for NameNode service calls |
| Total Space | Percentage of used space to overall storage capacity |
HDP 1.3
| Metric | Description |
|---|---|
| Blocks Corrupt | Number of blocks whose replicas are all corrupt |
| Blocks Missing | Number of blocks with no replicas anywhere in the cluster |
| Components Down | Number of service components not running |
| CPU | Average node CPU use. CPU is calculated as the sum of the user CPU and system CPU usage percentages. |
| Jobs Failed | Number of jobs that failed |
| Jobs Running | Number of jobs currently executing in the system |
| Job Tracker CPU | CPU use for the node running the jobtracker service |
| Map Tasks Running | Number of map tasks executing in the system |
| Map Tasks Waiting | Number of map tasks waiting to execute |
| Max Disk by Node | Amount of used disk space on the node with the most disk space in use |
| Name Node CPU | Node CPU use for the node running the namenode service |
| Name Node Heap | Percentage of heap space used in the namenode JVM |
| Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
| Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
| Reduce Tasks Running | Number of reduce tasks executing in the system |
| Reduce Tasks Waiting | Number of reduce tasks waiting to execute |
| RPC Latency JT | Average wait time in queue for jobtracker service calls |
| RPC Latency NN | Average wait time in queue for namenode service calls |
| Total Space | Percentage of used space to overall storage capacity |