- The TDCH-TPT interface is a bridge between TPT and TDCH. The TDCH-TPT interface extends TDCH to support Hadoop file and table transfers to TPT, and vice versa. This interface allows TPT users the to use all the pre-existing TDCH functionality within a TPT script, and also allows to use TPT-specific functionalities alongside TDCH.
- When a TPT job script includes the DataConnector operator alongside any of the TDCH-specific Hadoop attributes, the DataConnector operator will launch a TDCH job using those TDCH-specific Hadoop attributes supplied in the TPT script. Once TDCH has validated the attribute values and filled in defaults for any missing attributes, TDCH will submit the job to the MapReduce framework. Once the map tasks have been initialized on the nodes in the Hadoop cluster, they will connect to the DataConnector operator and begin transferring data. The HadoopProperties attribute can be used to specify one or more Hadoop properties and their values (separated by spaces) which will then be used by TPT when submitting the Hadoop command internally. The HadoopProperties attribute can be used to specify one or more Hadoop properties and their values (separated by spaces) which will then be used by TPT when submitting the Hadoop command internally. The HadoopProperties attribute can be used to specify one or more Hadoop properties and their values (separated by spaces) which will then be used by TPT when submitting the Hadoop command internally.
- The TDCH-TPT interface depends on the TDCH jar file to be used during the runtime. There is no installed default TDCH jar file for TPT. Users must download and install the appropriate Teradata Connector version for their Hadoop Distribution from the following location: https://downloads.teradata.com/download/connectivity/teradata-connector-for-hadoop-command-line-edition.
- After the installation, users must set the TDCH_JARFILE environment variable to the fully qualified filename of the desired TDCH jar file before running any TPT job.
When running import/export jobs on Cloudera Data Platform Distributions using TDCH, remember the following points:
- For Hive import jobs using Data Connector Writer, user must specify the following Hadoop properties in the attribute HadoopProperties:
- -hivecapabilities when using TDCH 1.8.7 and above versions
- -hivecapabilities, -hiveusername and -hivepassword when using TDCH 1.8.7.1 and TDCH 1.8.7.2 versions
- Do not specify Hadoop Properties (-hcapabilities, -hiveuser and -hivepassword) in Hive export jobs when using TDCH 1.8.7, 1.8.7.1 and 1.8.7.2 versions. The Data Connector Reader will be terminated with the following error:
TPT19434 pmRead failed. General failure (34): 'Unknown Access Module failure' Also the following message can be found in TDCH log: unrecognized input parameter(s): -hiveusername -hivepassword -hivecapabilities
Following is an example of how these Hadoop properties can be specified in the attribute HadoopProperties:
-hivecapabilities HIVEMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE -hiveusername hive -hivepassword hive
For more information on Hadoop properties, see the article "How to run a TDCH Hive import job on CDP 7.1.7 SP1 and above versions" in the following location: https://support.teradata.com/knowledge.