Summary View Metrics
| Metric | Description |
|---|---|
| Unhealthy Services | Aggregate number of all CDH services in a Bad or Concerning state (Unknown and Disabled are not included) |
| Applications Running | Number of YARN applications currently executing |
| Cluster Memory Allocated | Percent of available memory allocated across all NodeManager instances |
| HDFS Max Node I/O | Highest I/O level in bytes on any node in the HDFS system |
| HDFS Disk Usage | Percentage of space being used |
YARN Metrics
| Metric | Description |
|---|---|
| Applications Failed | Number of YARN applications that failed to execute successfully |
| Applications Completed | Number of YARN applications that executed successfully |
| Applications Running | Number of YARN applications currently executing |
| Cluster Memory Allocated | Percent of available memory allocated across all NodeManager instances |
| Cluster Memory Reserved | Percent of available memory reserved across all NodeManager instances |
| Cluster Memory Skew | Comparison of the largest NodeManager memory allocated to the average memory allocated |
| Containers Allocated | Number of YARN containers currently allocated across the cluster |
| Containers Pending | Number of YARN containers currently pending across the cluster |
| Containers Reserved | Number of YARN containers currently reserved across the cluster |
| NodeManagers | Number of nodemanagers in a bad (critical), concerning (degraded), and good state Unknown and disabled states display when there are one or more in those states |
| ResourceManager Up Since | Timestamp when the ResourceManager service started |
| ResourceManager Heap | Percentage of heap space used in the ResourceManager JVM |
| ResourceManager UI | Open web interface for the service in a new window |
HDFS Metrics
| Metric | Description |
|---|---|
| Capacity Usage | Percentage of used space to overall storage capacity |
| Datanodes | Number of datanodes in a bad (critical), concerning (degraded), and good state Unknown and disabled states display when there are one or more in those states |
| Files + Directories Total | Total number of files and directories in HDFS |
| Namenode Up since | Timestamp when the namenode service started |
| Namenode Heap | Percentage of heap space used in the namenode JVM |
| Namenode UI | Open web interface for the service in new window |
HBase Metrics
| Metric | Description |
|---|---|
| Load Average | Average region load per region server |
| Region Servers | Number of region servers in a bad (critical), concerning (degraded), and good state Unknown and disabled states display when there are one or more in those states |
| Master Server Up since | Timestamp when the master server started |
| Master Server Heap | Percentage of heap space used in the master server JVM |
| Master Server UI | Open web interface for the service in new window |
Other Services Metrics
| Metric | Description |
|---|---|
| Bad | Services in a critical state |
| Concerning | Services in a degraded state |
| Disabled | Services in a disabled state when there are one or more in this state |
| Good | Services in a good state |
| Unknown | Services in a unknown state when there are one or more in this state |