System Health Metrics for Teradata Hadoop

Teradata® Viewpoint User Guide

brand
Analytical Ecosystem
prodname
Teradata Viewpoint
Teradata Workload Management
vrm_release
16.20
category
User Guide
featnum
B035-2206-107K

CDH

Metric Description
Applications Failed Number of YARN applications that failed to execute successfully
Applications Running Number of YARN applications currently executing
Blocks Corrupt Amount of corrupt blocks in HDFS
Blocks Missing Amount of missing blocks in HDFS
Cluster Memory Allocated Percent of the available memory allocated across all NodeManager instances
CPU Average node CPU use
Max Disk by Node Largest percentage of used disk space on a node
Name Node CPU Average node CPU use for nodes running NameNode services
Name Node Heap Percentage of heap space used in the NameNode JVM
Node CPU Skew Comparison of CPU use on the busiest node to the average node
Node I/O Skew Comparison of I/O use on the busiest node to the average node
ResourceManager Heap Percentage of heap space used in the ResourceManager JVM
RPC Latency - RM Average wait time in queue for ResourceManager service calls
RPC Latency - NN Average wait time in queue for NameNode service calls
Services Bad Number of services in a critical state
Services Concerning Number of services in a degraded state
Total Space Percentage of used space to overall storage capacity

HDP 2.1 and later

Metric Description
Applications Failed Number of YARN applications that failed to execute successfully
Applications Running Number of YARN applications currently executing
Blocks Corrupt Amount of corrupt blocks in HDFS
Blocks Missing Amount of missing blocks in HDFS
Cluster Memory Allocated Percent of the available memory allocated across all NodeManager instances
Components Down Number of services not started
CPU Average node CPU use
Max Disk by Node Largest percentage of used disk space on a node
Name Node CPU Average node CPU use for nodes running NameNode services
Name Node Heap Percentage of heap space used in the NameNode JVM
Node CPU Skew Comparison of CPU use on the busiest node to the average node
Node I/O Skew Comparison of I/O use on the busiest node to the average node
ResourceManager Heap Percentage of heap space used in the ResourceManager JVM
RPC Latency - RM Average wait time in queue for ResourceManager service calls
RPC Latency - NN Average wait time in queue for NameNode service calls
Total Space Percentage of used space to overall storage capacity

HDP 1.3

Metric Description
Blocks Corrupt Number of blocks whose replicas are all corrupt
Blocks Missing Number of blocks with no replicas anywhere in the cluster
Components Down Number of service components not running
CPU Average node CPU use. CPU is calculated as the sum of the user CPU and system CPU usage percentages.
Jobs Failed Number of jobs that failed
Jobs Running Number of jobs currently executing in the system
Job Tracker CPU CPU use for the node running the jobtracker service
Map Tasks Running Number of map tasks executing in the system
Map Tasks Waiting Number of map tasks waiting to execute
Max Disk by Node Amount of used disk space on the node with the most disk space in use
Name Node CPU Node CPU use for the node running the namenode service
Name Node Heap Percentage of heap space used in the namenode JVM
Node CPU Skew Comparison of CPU use on the busiest node to the average node
Node I/O Skew Comparison of I/O use on the busiest node to the average node
Reduce Tasks Running Number of reduce tasks executing in the system
Reduce Tasks Waiting Number of reduce tasks waiting to execute
RPC Latency JT Average wait time in queue for jobtracker service calls
RPC Latency NN Average wait time in queue for namenode service calls
Total Space Percentage of used space to overall storage capacity