CDH
Metric | Description |
---|---|
Applications Failed | Number of YARN applications that failed to execute successfully |
Applications Running | Number of YARN applications currently executing |
Blocks Corrupt | Amount of corrupt blocks in HDFS |
Blocks Missing | Amount of missing blocks in HDFS |
Cluster Memory Allocated | Percent of the available memory allocated across all NodeManager instances |
CPU | Average node CPU use |
Max Disk by Node | Largest percentage of used disk space on a node |
Name Node CPU | Average node CPU use for nodes running NameNode services |
Name Node Heap | Percentage of heap space used in the NameNode JVM |
Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
ResourceManager Heap | Percentage of heap space used in the ResourceManager JVM |
RPC Latency - RM | Average wait time in queue for ResourceManager service calls |
RPC Latency - NN | Average wait time in queue for NameNode service calls |
Services Bad | Number of services in a critical state |
Services Concerning | Number of services in a degraded state |
Total Space | Percentage of used space to overall storage capacity |
HDP 2.1 and later
Metric | Description |
---|---|
Applications Failed | Number of YARN applications that failed to execute successfully |
Applications Running | Number of YARN applications currently executing |
Blocks Corrupt | Amount of corrupt blocks in HDFS |
Blocks Missing | Amount of missing blocks in HDFS |
Cluster Memory Allocated | Percent of the available memory allocated across all NodeManager instances |
Components Down | Number of services not started |
CPU | Average node CPU use |
Max Disk by Node | Largest percentage of used disk space on a node |
Name Node CPU | Average node CPU use for nodes running NameNode services |
Name Node Heap | Percentage of heap space used in the NameNode JVM |
Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
ResourceManager Heap | Percentage of heap space used in the ResourceManager JVM |
RPC Latency - RM | Average wait time in queue for ResourceManager service calls |
RPC Latency - NN | Average wait time in queue for NameNode service calls |
Total Space | Percentage of used space to overall storage capacity |
HDP 1.3
Metric | Description |
---|---|
Blocks Corrupt | Number of blocks whose replicas are all corrupt |
Blocks Missing | Number of blocks with no replicas anywhere in the cluster |
Components Down | Number of service components not running |
CPU | Average node CPU use. CPU is calculated as the sum of the user CPU and system CPU usage percentages. |
Jobs Failed | Number of jobs that failed |
Jobs Running | Number of jobs currently executing in the system |
Job Tracker CPU | CPU use for the node running the jobtracker service |
Map Tasks Running | Number of map tasks executing in the system |
Map Tasks Waiting | Number of map tasks waiting to execute |
Max Disk by Node | Amount of used disk space on the node with the most disk space in use |
Name Node CPU | Node CPU use for the node running the namenode service |
Name Node Heap | Percentage of heap space used in the namenode JVM |
Node CPU Skew | Comparison of CPU use on the busiest node to the average node |
Node I/O Skew | Comparison of I/O use on the busiest node to the average node |
Reduce Tasks Running | Number of reduce tasks executing in the system |
Reduce Tasks Waiting | Number of reduce tasks waiting to execute |
RPC Latency JT | Average wait time in queue for jobtracker service calls |
RPC Latency NN | Average wait time in queue for namenode service calls |
Total Space | Percentage of used space to overall storage capacity |