System Metrics (CDH, HDP 2.1 and later, and HDP 1.3)
The system metrics listed in the following table are available to analyze resource usage.
Metric | Description | Type |
---|---|---|
CPU Idle | CPU time not processing any commands and the system not having an outstanding disk I/O request | Percent |
CPU Nice | CPU time spent executing at the user level with nice priority | Percent |
CPU Skew | Comparison of CPU use on the busiest node to the average node | Percent |
CPU System | CPU time spent running kernel code | Percent |
CPU Usage | Average node CPU use. CPU is calculated as the sum of the user CPU and system CPU usage percentages. | Percent |
CPU User | CPU time spent running non-kernel code | Percent |
CPU Wait I/O | CPU time spent waiting for I/O | Percent |
Disk Skew | Comparison of disk space on the most full node to the average node | Percent |
Disk Use | Disk space being used on a system | Percent |
Load average last 15 minutes | Average number of jobs in the job queue over the last 15 minutes | Number |
Load average last 5 minutes | Average number of jobs in the job queue over the last 5 minutes | Number |
Load average last minute | Average number of jobs in the job queue over the last minute | Number |
Memory Usage | Average memory use of the system during a sample period | Percent |
Network In | Rate of network traffic into the node in bytes per second | Number |
Network Out | Rate of network traffic out of the node in bytes per second | Number |
HDFS Metrics (CDH, HDP 2.1 and later, and HDP 1.3)
The HDFS metrics listed in the following table are available to analyze resource usage.
Metric | Description | Type |
---|---|---|
Blocks Corrupt | Blocks whose replicas are all corrupt | Number |
Blocks Excess | Blocks that exceed their target replication for the file they belong to | Number |
Blocks Missing | Blocks with no replicas anywhere in the cluster | Number |
Blocks Pending Deletion | Blocks waiting for deletion | Number |
Blocks Pending Replication | Blocks waiting to be replicated | Number |
Blocks Scheduled for Replication | Blocks scheduled for replication | Number |
Blocks Under Replicated | Blocks that do not meet their target replication for the file they belong to | Number |
Datanode I/O | Disk use on the data node | Number |
Disk Capacity Used | Bytes of disk space currently used by HDFS | Number |
Disk Usage | Available disk space used by HDFS | Percent |
Files + Directories | Total files and directories in HDFS | Number |
Files Appended | Files appended | Number |
Files Created | Files created | Number |
Files Deleted | Files deleted | Number |
Total Load | Connections to HDFS | Number |
YARN Metrics (CDH and HDP 2.1 and later)
The YARN metrics listed in the following table are available to analyze resource usage.
Metric | Description | Type |
---|---|---|
Applications Completed | YARN applications that were completed during the interval | Number |
Applications Failed | YARN applications that failed during the interval | Number |
Applications Running | Average YARN applications that were running during the interval | Number |
Applications Submitted | YARN applications that were submitted during the interval | Number |
Cluster Memory Allocated | Available cluster memory that was marked as allocated | Percent |
Cluster Memory Reserved | Available memory that was marked as reserved | Percent |
Cluster Memory Skew | Skew of the cluster memory across the different NodeManager instances | Percent |
YARN Containers Allocated | Average YARN containers that were allocated/running during the interval | Number |
MapReduce Metrics (HDP 1.3)
The MapReduce metrics listed in the following table are available to analyze resource usage.
Metric | Description | Type |
---|---|---|
Jobs Completed | Jobs that completed successfully | Number |
Jobs Failed | Jobs that failed before completion | Number |
Jobs Running | Jobs currently executing in the system | Number |
Jobs Submitted | Jobs queued to execute in the system | Number |
Map Tasks Completed | Map tasks completed successfully | Number |
Map Tasks Failed | Map tasks that failed before completion | Number |
Map Tasks Launched | Map tasks opened | Number |
Map Tasks Running | Map tasks currently executing in the system | Number |
Map Tasks Waiting | Map tasks queued to run | Number |
Reduce Tasks Completed | Reduce tasks completed successfully | Number |
Reduce Tasks Failed | Reduce tasks that failed before completion | Number |
Reduce Tasks Launched | Reduce tasks opened | Number |
Reduce Tasks Running | Reduce tasks running | Number |
Reduce Tasks Waiting | Reduce tasks queued to run | Number |