Considerations when Processing Hadoop Files and Tables - Parallel Transporter

Teradata Parallel Transporter Reference

Product
Parallel Transporter
Release Number
16.10
Published
July 2017
Language
English (United States)
Last Update
2018-06-28
dita:mapPath
egk1499705348414.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-2436
lifecycle
previous
Product Category
Teradata Tools and Utilities
  • When using the HDFS API Interface, the HadoopHost value must point to the NameNode of the Hadoop cluster.
  • When using the TDCH-TPT Interface, the HadoopHost value must be the host name or IP address of the node on which TPT is running. This host name or IP address must be reachable from all the DataNodes in the Hadoop cluster.
  • Both the HDFS API Interface and the TDCH-TPT Interface require that the node on which TPT is running has the Hadoop client jars installed and the HDFS API also requires a copy of the Hadoop Cluster Configuration files.
  • When using the HDFS API Interface, the version of the Hadoop client jars must match the version of the Hadoop jars on the NameNode of the Hadoop cluster defined by the HadoopHost attribute.
  • The HDFS API requires that the following environment variables must be set:
    • JAVA_HOME = Root Location of the Java JDK
      The HDFS API uses JAVA JNI.
      The version of Java 32-bit or 64-bit must match the version of TPT 32-bit or 64-bit.
    • HADOOP_HOME = Root Location of the Hadoop Client Jar Files and Hadoop Configuration files.
  • The HDFS API can use the following optional environment variables:
    • LIBHDFS_OPTS = extra options to pass into the JVM.
    • LIBHDFS_JVM_PATH = full-path of jvm.dll ( jvm.so ).

      If LIBHDFS_JVM_PATH is not provided, then JAVA_HOME is used to locate the jvm.dll.

    • LIBHDFS_CLASSPATH = classpath for jni jvm.

      • LIBHDFS_CLASSPATH takes precedence over CLASSPATH.

      • If neither CLASSPATH or LIBHDFS_CLASSPATH is provided, HADOOP_HOME or HADOOP_PREFIX will be used to create the CLASSPATH.

      • If CLASSPATH or LIBHDFS_CLASSPATH contains syntax that is incompatible with JNI (i.e. “*”), they will be discarded and HADOOP_HOME or HADOOP_PREFIX will be used to create the classpath.

    • CLASSPATH = class path for jni jvmNOTE: { Not wise to use because it is usually not set correctly for JNI }.
    • JAVA_LIBRARY_PATH = Path to hadoop.dll for java native method support.
  • When using the TDCH-TPT Interface, the following environment variables must be defined. Before launching the TDCH job, the TDCH-TPT interface checks to see if the environment variables are set, and if not the TDCH-TPT interface exports the environment variables with the default values defined below. See the Teradata Connector for Hadoop tutorial for more information about the environment required by TDCH.
    • HADOOP_HOME – location of the Hadoop libraries; this value defaults to '/usr/lib/hadoop'.
    • HIVE_HOME – location of the Hive libraries; this value defaults to '/usr/lib/hive'.
    • HCAT_HOME – location of the HCatalog libraries; this value defaults to '/usr/lib/hcatalog'.
    • HIVE_LIB_JARS – comma separated list of jar files required by TDCH; if not set the TDCH-TPT interface will build this list using the jars found in the $HCAT_HOME/lib and $HIVE_HOME/lib.
    • HADOOP_CLASSPATH – semicolon separated list of jar files and directories required by TDCH; if not set the TDCH-TPT interface will build this list using the jars found in $HCAT_HOME/lib and $HIVE_HOME/lib, as well as the directory $HIVE_HOME/conf.