System Metrics (CDH, HDP 2.1 and later, and HDP 1.3)
| Metric | Description | Type |
|---|---|---|
| CPU Idle | CPU time not processing any commands and the system not having an outstanding disk I/O request | Percent |
| CPU Nice | CPU time spent executing at the user level with nice priority | Percent |
| CPU Skew | Comparison of CPU use on the busiest node to the average node | Percent |
| CPU System | CPU time spent running kernel code | Percent |
| CPU Usage | Average node CPU use. CPU is calculated as the sum of the user CPU and system CPU usage percentages. | Percent |
| CPU User | CPU time spent running non-kernel code | Percent |
| CPU Wait I/O | CPU time spent waiting for I/O | Percent |
| Disk Skew | Comparison of disk space on the most full node to the average node | Percent |
| Disk Use | Disk space being used on a system | Percent |
| Load average last 15 minutes | Average number of jobs in the job queue over the last 15 minutes | Number |
| Load average last 5 minutes | Average number of jobs in the job queue over the last 5 minutes | Number |
| Load average last minute | Average number of jobs in the job queue over the last minute | Number |
| Memory Usage | Average memory use of the system during a sample period | Percent |
| Network In | Rate of network traffic into the node in bytes per second | Number |
| Network Out | Rate of network traffic out of the node in bytes per second | Number |
HDFS Metrics (CDH, HDP 2.1 and later, and HDP 1.3)
| Metric | Description | Type |
|---|---|---|
| Blocks Corrupt | Blocks whose replicas are all corrupt | Number |
| Blocks Excess | Blocks that exceed their target replication for the file they belong to | Number |
| Blocks Missing | Blocks with no replicas anywhere in the cluster | Number |
| Blocks Pending Deletion | Blocks waiting for deletion | Number |
| Blocks Pending Replication | Blocks waiting to be replicated | Number |
| Blocks Scheduled for Replication | Blocks scheduled for replication | Number |
| Blocks Under Replicated | Blocks that do not meet their target replication for the file they belong to | Number |
| Datanode I/O | Disk use on the data node | Number |
| Disk Capacity Used | Bytes of disk space currently used by HDFS | Number |
| Disk Usage | Available disk space used by HDFS | Percent |
| Files + Directories | Total files and directories in HDFS | Number |
| Files Appended | Files appended | Number |
| Files Created | Files created | Number |
| Files Deleted | Files deleted | Number |
| Total Load | Connections to HDFS | Number |
YARN Metrics (CDH and HDP 2.1 and later)
The following YARN metrics are available to analyze resource usage for a Hadoop system.
| Metric | Description | Type |
|---|---|---|
| Applications Completed | YARN applications that were completed during the interval | Number |
| Applications Failed | YARN applications that failed during the interval | Number |
| Applications Running | Average YARN applications that were running during the interval | Number |
| Applications Submitted | YARN applications that were submitted during the interval | Number |
| Cluster Memory Allocated | Available cluster memory that was marked as allocated | Percent |
| Cluster Memory Reserved | Available memory that was marked as reserved | Percent |
| Cluster Memory Skew | Skew of the cluster memory across the different NodeManager instances | Percent |
| YARN Containers Allocated | Average YARN containers that were allocated/running during the interval | Number |
MapReduce Metrics (HDP 1.3)
| Metric | Description | Type |
|---|---|---|
| Jobs Completed | Jobs that completed successfully | Number |
| Jobs Failed | Jobs that failed before completion | Number |
| Jobs Running | Jobs currently executing in the system | Number |
| Jobs Submitted | Jobs queued to execute in the system | Number |
| Map Tasks Completed | Map tasks completed successfully | Number |
| Map Tasks Failed | Map tasks that failed before completion | Number |
| Map Tasks Launched | Map tasks opened | Number |
| Map Tasks Running | Map tasks currently executing in the system | Number |
| Map Tasks Waiting | Map tasks queued to run | Number |
| Reduce Tasks Completed | Reduce tasks completed successfully | Number |
| Reduce Tasks Failed | Reduce tasks that failed before completion | Number |
| Reduce Tasks Launched | Reduce tasks opened | Number |
| Reduce Tasks Running | Reduce tasks running | Number |
| Reduce Tasks Waiting | Reduce tasks queued to run | Number |