5.12.1 - 5.13 - CDH Nodes - Cloudera Distribution for Hadoop

Cloudera Distribution for Hadoop for Teradata Administrator Guide

Cloudera Distribution for Hadoop
Release Number
November 2017
English (United States)
Last Update

Cloudera Distribution for Hadoop for Teradata consists of master, data, and edge nodes.

Master Node for Hadoop
Controls the cluster by storing metadata and running master services, including:
  • HCatalog: Describes the structure of data stored in HDFS
  • Hive: Queries structured data in HDFS
  • JournalNode: Modifies log changes in HDFS from the namenode
  • Namenode: Manages HDFS storage; high availability requires an active and standby namenode
  • YARN: Schedules application jobs and manages and allocates resources
  • Zookeeper: Synchronizes distributed components as well as monitoring the namenode
Data Node for Hadoop
  • Stores HDFS blocks
  • Answers queries from the namenode for filesystem operations
  • Allows client applications to communicate directly with the data node when the namenode determines the data location
Edge Node for Hadoop
The edge node allows client applications to run independently of the master node, reducing both the risk in testing new applications and the impact on Teradata Database throughput by enhancing load performance, which TASM or Teradata Integrated Workload Management ruleset throttles. Located between the Hadoop cluster and the customer network, the edge node runs client services for the cluster:
  • Allowing access for external applications and user access to the Hadoop environment
  • Permitting access control
  • Enforcing policy oversight
  • Logging metadata
  • Providing fast connections by communicating to the Hadoop cluster over the internal InfiniBand network