Considerations when Processing Hadoop Files and Tables
When using the HDFS API Interface, the HadoopHost value must point to the NameNode
of the Hadoop cluster.
When using the TDCH-TPT Interface, the HadoopHost value must be the host name or IP
address of the node on which TPT is running. This host name or IP address must be
reachable from all the DataNodes in the Hadoop cluster.
Both the HDFS API Interface and the TDCH-TPT Interface require that the node on which
TPT is running has the Hadoop client jars installed and the HDFS API also requires
a copy of the Hadoop Cluster Configuration files.
When using the HDFS API Interface, the version of the Hadoop client jars must match
the version of the Hadoop jars on the NameNode of the Hadoop cluster defined by the
HadoopHost attribute.
The HDFS API requires that the following environment variables must be set:
JAVA_HOME = Root Location of the Java JDK
Note: The HDFS API uses JAVA JNI.
Note: The version of Java 32-bit or 64-bit must match the version of TPT 32-bit or 64-bit.
HADOOP_HOME = Root Location of the Hadoop Client Jar Files and Hadoop Configuration
files.
The HDFS API can use the following optional environment variables:
LIBHDFS_OPTS = extra options to pass into the JVM.
LIBHDFS_JVM_PATH = full-path of jvm.dll ( jvm.so ).
If LIBHDFS_JVM_PATH is not provided, then we will use JAVE_HOME to locate the jvm.dll.
LIBHDFS_CLASSPATH = classpath for jni jvm.
LIBHDFS_CLASSPATH takes precedence over CLASSPATH.
If neither CLASSPATH or LIBHDFS_CLASSPATH is provided, HADOOP_HOME or HADOOP_PREFIX
will be used to create the CLASSPATH.
If CLASSPATH or LIBHDFS_CLASSPATH contains syntax that is incompatible with JNI (i.e.
“*”), they will be discarded and HADOOP_HOME or HADOOP_PREFIX will be used to create
the classpath.
CLASSPATH = class path for jni jvmNOTE: { Not wise to use because it is usually
not set correctly for JNI }.
JAVA_LIBRARY_PATH = Path to hadoop.dll for java native method support.
When using the TDCH-TPT Interface, the following environment variables must be defined.
Before launching the TDCH job, the TDCH-TPT interface checks to see if the environment
variables are set, and if not the TDCH-TPT interface exports the environment variables
with the default values defined below. See the Teradata Connector for Hadoop tutorial for more information about the environment required by TDCH.
HADOOP_HOME - location of the Hadoop libraries; this value defaults to '/usr/lib/hadoop'.
HIVE_HOME - location of the Hive libraries; this value defaults to '/usr/lib/hive'.
HCAT_HOME - location of the HCatalog libraries; this value defaults to '/usr/lib/hcatalog'.
HIVE_LIB_JARS - comma separated list of jar files required by TDCH; if not set the
TDCH-TPT interface will build this list using the jars found in the $HCAT_HOME/lib
and $HIVE_HOME/lib.
HADOOP_CLASSPATH - semicolon separated list of jar files and directories required
by TDCH; if not set the TDCH-TPT interface will build this list using the jars found
in $HCAT_HOME/lib and $HIVE_HOME/lib, as well as the directory $HIVE_HOME/conf.
When using the TDCH-TPT interface, a TPT-specific version of TDCH called TDCH-TPT.jar
is used. Installation of later versions of TDCH on the local node will not be used
by the TDCH-TPT interface.