These metrics are available for Hadoop alerts. The associated property names allow you to customize alert actions in the Alert Setup portlet or customize the message in the Monitored Systems portlet.
Unless otherwise indicated, metrics are common to both HDP 2.1 and later and HDP 1.3.
Metrics for HDFS Alert Types
Metric | Description | Property Name |
---|---|---|
Blocks Corrupt | Number of blocks whose replicas are all corrupt | corruptBlocks |
Blocks Excess | Number of blocks that exceed their target replication for the file they belong to | excessBlocks |
Blocks Missing | Number of blocks with no replicas anywhere in the cluster | missingBlocks |
Blocks Pending Deletion | Number of blocks waiting for deletion | pendingDeletionBlocks |
Blocks Pending Replication | Number of blocks waiting to be replicated | pendingReplicationBlocks |
Blocks Scheduled for Replication | Number of blocks scheduled for replication | scheduledReplicationBlocks |
Blocks Under Replicated | Number of blocks that do not meet their target replication for the file they belong to | underReplicatedBlocks |
Disk Capacity Used | Number of bytes of disk space currently used by HDFS | capacityUsed |
Disk Usage | Percentage of available disk space used by HDFS | hfdsDiskUsage |
Files + Directories | Total number of files and directories in HDFS | filesTotal |
Total Load | Number of connections to HDFS | totalLoad |
Metrics for YARN Alert Types (HDP 2.1 and later)
Metric | Description | Property Name |
---|---|---|
Applications Running | Number of YARN applications currently executing | appsRunning |
Cluster Memory Allocated | Percent of the available memory allocated across all NodeManager instances | clusterMemUsed |
Containers Allocated | Number of YARN containers currently allocated | allocatedContainers |
Metrics for MapReduce Alert Types (HDP 1.3)
Metric | Description | Property Name |
---|---|---|
Jobs Running | Number of jobs currently executing on the system | jobsRunning |
Map Tasks Running | Number of map tasks running | runningMaps |
Map Tasks Waiting | Number of map tasks queued to run | waitingMaps |
Reduce Tasks Running | Number of reduce tasks running | runningReduces |
Reduce Tasks Waiting | Number of reduce tasks queued to run | waitingReduces |
Metrics for System Alert Types
Metric | Description | Property Name |
---|---|---|
CPU Idle | Percentage of CPU time not processing any commands and the system not having an outstanding disk I/O request | cpuIdle |
CPU Nice | Percentage of CPU time spent executing at the user level with nice priority | cpuNice |
CPU Skew | Comparison of CPU use on the busiest node to the average node | cpuSkew |
CPU System | Percentage of CPU time spent running kernel code | cpuSystem |
CPU Usage | The sum of the CPU user and CPU system usage percentages. | cpuUse |
CPU User | Percentage of CPU time spent running non-kernel code | cpuUser |
CPU Wait I/O | Percentage of CPU time spent waiting for I/O | cpuWio |
Disk Skew | Comparison of disk space on the fullest node to the average node | diskSkew |
Disk Use | Percentage of disk space being used on a system | diskUse |
Load average last 15 minutes | Average number of jobs in the job queue over the last 15 minutes | loadFifteen |
Load average last 5 minutes | Average number of jobs in the job queue over the last 5 minutes | loadFive |
Load average last minute | Average number of jobs in the job queue over the last minute | loadOne |
Memory Usage | Average memory use of the system during a sample period | memUse |
Network In | Rate of incoming network traffic in bytes per second | bytesIn |
Network Out | Rate of outgoing network traffic in bytes per second | bytesOut |
Metrics for System Health Alert Types
Metric | Description | Property Name |
---|---|---|
Health | Name of the system health state | health |