create (Teradata Open Distribution for Hadoop) | Commands | Teradata Data Mover - create (Teradata Open Distribution for Hadoop) - Teradata Data Mover

Teradata® Data Mover User Guide

Product
Teradata Data Mover
Release Number
17.00
Published
November 30, 2021
Language
English (United States)
Last Update
2021-11-04
dita:mapPath
gmn1574692806523.ditamap
dita:ditavalPath
cjo1509024828169.ditaval
dita:id
B035-4101
lifecycle
previous
Product Category
Analytical Ecosystem

Purpose

The create command creates a job on the daemon using the syntax parameters and object list. A job definition consists of the parameters and object list.

Data Mover supports moving tables from Teradata Database to Hadoop and hive tables from Hadoop to Teradata Database.

Parameters - Hadoop Specific

The following parameters are specific to Hadoop. Some of these parameters take a different form depending on the mechanism used to pass them.

See Parameter Order.

hadoop_options
These parameters are applicable only for Teradata to Hadoop. When passing this information using an .xml file, nest the options below the parameter.
Option Description
file_options Values supported by the Teradata Connector for Hadoop and SQL-H:
  • Text (Default)
  • RC
  • ORC
file_options_delimiter [Optional] Specifies a character delimiter for columns. If not specified, comma (',') is the delimiter. This option applies only if file_options is specified as Text.
number_mappers [Optional] Specifies the number of mappers Teradata Connector uses to pull data from Teradata Database.
The parameters for passing equivalent information when using the CLI are as follows:
  • hadoop_file_option
  • hadoop_file_delimiter
  • hadoop_number_mappers
source_hadoop_system | target_hadoop_system
Defines the source or target sites and credential information for Hadoop data. These parameters are common to both Teradata to Hadoop and Hadoop to Teradata. When passing them using an .xml file, nest the options below the parameter.
Option Value Description
webhcat_url http://webhcat:50111 Retrieves metadata such as databases, tables, column, and so on. To construct the URL, replace webhcat with the system name where the WebHCAT server resides. Port 50111 is the default for WebHCAT.1
oozie_url http://oozie:11000 Runs hive queries and Teradata Connector for Hadoop (TDCH) jobs for data movement. To construct the URL, replace oozie with the system name where the Oozie server resides. Port 11000 is the default for Oozie.1
file_system_url http://webhdfs:50070 or http://httpfs:14000 Retrieves configuration file stored in HDFS to run TDCH jobs and logs generated by TDCH jobs. Specify either WebHDFS REST URL or HttpFS REST URL. Default port for WebHDFS is 50070. Default port for HttpFS is 14000.1
logon_mechanism default or kerberos [Optional] The security protocol for logging in to the source or target Hadoop File System.
hive_user hive or hive_user@hadoop_realm Name of the user who has access to the tables in the Hadoop File System.

When the logon mechanism is kerberos, value for the hive user must adhere to the following the convention: kerberos_principal@kerberos_realm

hive_password hive_user_password Password of the user who has access to the tables in the source or target Hadoop File System.2
hive_password_encrypted hive_user_password_encrypted Encrypted password of the user who has access to the tables in the source or target Hadoop File System. Not a valid parameter if hive_password is specified.2
1The value must start with http:// followed by system name or IP address and port number. If the logon mechanism is kerberos, the host name must be the fully qualified domain name.

2If the logon mechanism is default, this parameter is optional. If the logon mechanism is kerberos, this parameter is required and must be the password for the hive_user.

The parameters for passing equivalent information when using the CLI are as follows:
  • source_hadoop_webhcat_url | target_hadoop_webhcat_url
  • source_hadoop_oozie_url | target_hadoop_oozie_url
  • source_hadoop_file_system_url | target_hadoop_file_system_url
  • source_hive_logon_mechanism | target_hive_logon mechanism
  • source_hive_user | target_hive_user
  • source_hive_password_encrypted | target_hive_password encrypted

Parameters - Common

These parameters are common to both Teradata to Hadoop and Hadoop to Teradata.

See Parameter Order.

compare_ddl
[Optional] Designates whether to compare DDL for the source and target objects specified.
Valid Values
  • unspecified (default) the daemon.default.compareDDL.enabled parameter value from the Data Mover daemon configuration file is used. If the parameter is specified as true or false, that value overrides the parameter value in the Data Mover daemon configuration file.
  • true - the job steps are not recreated each time the job is started and DDL is compared on source and target.
  • false - the job steps are recreated each time the job is started and DDL is not compared on source and target.
db_client_encryption
[Optional] Set to true if job needs to be encrypted during data transfer.
dm.rest.endpoint
[Optional] Enter a Data Mover REST server URL to overwrite the default value specified in the commandline.properties file in order to connect to a different REST server (and therefore a different daemon) at runtime.
https://dm-server1:1443/datamover
freeze_job_steps
[Optional] Freezes the job steps so that they are not recreated each time the job is started. Should only be set to true if the source and target environments do not change after the job is created.
Valid Values
  • true - the job steps are not recreated each time the job is started
  • false - the job steps are recreated each time the job is started
  • unspecified (default) - the value is set to false
Example: true
job_name
[Optional] Name for this job. The name must be unique and up to 32 characters.
job_priority
[Optional] Specifies the execution priority for the job. Supported values are: HIGH, MEDIUM, and LOW, and UNSPECIFIED. If no value is specified, the default of MEDIUM is used at runtime.
Example: MEDIUM
log_to_event_table
[Optional] Specifies the event table to be used for this job. For more information, see Using Event Tables.
overwrite_existing_objects
[Optional] Job overwrites objects that already exist on the target.
Valid Values
Value Description
True Enables overwriting
False Disables overwriting
Unspecified Default – the value is set to the value in the Data Mover daemon configuration file
If the parameter is not specified, the value is set to the overwrite_existing_objects parameter value in the Data Mover daemon configuration file. If the parameter is specified as true or false, that value takes precedence over the parameter value in the Data Mover daemon configuration file.
Example: true
response_timeout
[Optional] Amount of time, in seconds, to wait for response from the Data Mover daemon.
Example: 60
source_password | target_password
[Optional] Source or target Teradata logon password. Not a valid parameter if source_password_encrypted | target_password_encrypted is specified. If you do not specify a password for this parameter, the command prompts you to enter it interactively. Input is masked with a set number of asterisks, regardless of the length of the password.
source_password_encrypted | target_password_encrypted
[Optional] Source or target Teradata encrypted logon password. Not a valid parameter if source_password | target_password is specified.
source_tdpid | target_tdpid
Source or target Teradata Database.
source_user | target_user
[Optional] Source or target Teradata logon id. If you do not specify a logon id for this parameter, the command prompts you to enter it interactively.
source_userid_pool | target_userid_pool
[Optional] Job pulls the user from the specified credential pool. Only available for Teradata systems.
Option Description
source_userid_pool Pulls user from the Teradata source system.
target_userid_pool Pulls user from the Teradata target system.
Example: POOL-1
table
Option Description
name Name of the table to be moved.
target_name [Optional] Renames the table. If no value is specified, the original table name is used.
target_database [Optional] Moves the tables to a different database. If no value is specified, the original database is used as the target.

Parameters - Teradata to Hadoop

These parameters are used only for Teradata to Hadoop.

See Parameter Order.

force_utility
[Optional] Forces the Data Mover daemon to use a specific utility for a Teradata to Hadoop job. The Data Mover daemon uses SQL-H to move the table. If SQL-H cannot, Teradata Connector for Hadoop is used to move the table.
Valid Values
  • Default (Data Mover determines if SQL-H can be used to move the table. If not, it uses the Teradata Connector for Hadoop.)
  • Hadoop_Connector
  • SQLH
table_columns

[Optional] Customizes the table columns before moving data to Hadoop. If the target table does not exist, you can rename the columns on the target side. If a target table already exists, the column name and type should match the column that exists on target. The table column properties create the target table on Hadoop or validate the values of an existing table.

Option Description
name Name of the column to be modified
target_name [Optional] Renames the column. If no value is specified, the original column name is used as the target name.
target_type [Optional] Changes the default column type. If no value is specified, the default mapping is used. For more information about the default and available mappings, see Data Type Mappings.
type Table column type. Informational only; cannot be modified by the user.
size [Optional] Changes the size of the column type, applicable only for CHAR type. If no value is specified, the default value [255] is used.
transfer_method
[Optional] Teradata Connector for Hadoop that supports these options for data transfer from Teradata to Hadoop.
Option Description
Default Allows Data Mover to select AMPs if the source Teradata Database is 14.10 or later.
Hash The underlying Hadoop connector retrieves rows in a given hash value range of a specified split-by column from a source table in Teradata, and writes those records into a target file in HDFS.
Value The underlying Hadoop connector retrieve rows in a given value range of a specified split-by column from a source table in Teradata, and writes those records into a target file in HDFS.
Partition The underlying hadoop connector creates a staging PPI table on source database if the source table is not a PPI table.
Amp The underlying Hadoop connector retrieves rows in one or more AMPs from a source table in Teradata, and writes those records into a target file in HDFS. The Amp option is supported only if the Teradata Database is 14.10 or later.
use_foreign_server

For T2H utility, you can specify a foreign server that was previously created to move data from Hadoop to Teradata. This foreign server is used to move all tables in a particular job.

The following example shows use of the use_foreign_server command with job definition for a Teradata to Hadoop job:
<use_foreign_server> 
        <name>foreign_server</name> 
</use_foreign_server>
You must specify a foreign server in order for Data Mover to use T2H to move data from Teradata 15.0 to Hadoop . If you do not specify a foreign server and force utility is not T2H, Data Mover uses TDCH instead of T2H.

Parameters - Hadoop to Teradata

These parameters are used only for Hadoop to Teradata. Some of these parameters take a different form depending on the mechanism used to pass them.

See Parameter Order.

force_utility
[Optional] Forces the Data Mover daemon to use a specific utility for Hadoop copy operations. The Data Mover daemon uses SQL-H to move the table. If SQL-H cannot, Teradata Connector for Hadoop is used to move the table.
Valid Values
  • Default (Data Mover determines if SQL-H can be used to move the table. If not, the Teradata Connector for Hadoop) is used.
  • Hadoop_Connector
  • SQLH
size
If the target table does not exist, ensure the sum of all CHAR/VARCHAR columns is not greater than 64000; otherwise, the job will fail at runtime due to a JDBC limitation. For columns with the UNICODE session character set, consider the value double what is specified in size while calculating the sum of all columns' size. For example, if the column of the character set is UNICODE and the size is 2000, consider the size 4000.
table_columns
[Optional] Customizes the table columns before moving data to Teradata Database. For more information about data types, see Data Type Mappings. If the target table does not exist, you can rename the columns on the target side. If a target table already exists, the column name and type should match the column that exists on target. The table column properties create the target table on Teradata Database or validate the values of an existing table.
Option Description
name Name of the column to be modified
size [Optional] Limits the size of columns such as VARCHAR
target_name [Optional] Renames the column. If no value is specified, the original column name is used as the target name.
target_type [Optional] Changes the default column type. If no value is specified, the default mapping is used. For more information about the default and available mappings, see Data Type Mappings.
type Table column type. Informational only; cannot be modified by the user.
allowDuplicate [Optional] Specifies whether duplicate values are allowed in the target table column (true of false). Must not be specified or must be set to false if isPrimaryIndex is true, or an error occurs.
allowNull [Optional] Specifies if null values are allowed in the target table column (true or false). Must not be specified or must be set to false if isPrimaryIndex is true, or an error occurs.
isPrimaryIndex [Optional] Specifies whether null values are allowed in the target table column (true or false). If isPrimaryIndex is false and allowNull is true, allowDuplicate must not be specified or must be false, or an error occurs.
transfer_method 1
[Optional] The options that Teradata Connector for Hadoop supports for transferring data from Hadoop to Teradata.
Option Description
Default Data Mover selects internal_fastload.
batch_insert The underlying Hadoop connector insert rows into a NOPI staging table through a JDBC batch execution. After all mappers complete their processing, rows in the staging table are moved to the target table by an Insert-Select SQL operation.
transfer_batch_insert_size 2 If you specify batch_insert as the transfer method, you can also specify this parameter as a value representing the number of rows (for example, 10000, 50000). This property is not applicable when you specify internal_fastload as the transfer method.
internal_fastload The underlying Hadoop connector starts a database FastLoad session to load rows into a single NOPI staging table. All database FastLoad sessions are coordinated through an internal protocol. The FastLoad job finishes after all mappers complete their execution, and then rows in the NOPI staging table are copied to the target table by an Insert-Select SQL operation.
1In the CLI, this parameter takes the form hadoop_transfer_method.

2In the CLI, this parameter takes the form hadoop_transfer_batch_size.

use_foreign_server
In Teradata Database 15.0 or later, you can specify a foreign server that was previously created to move data from Hadoop to Teradata. This foreign server is used to move all tables in a particular job.
The following example shows use of the use_foreign_server command with job definition for Hadoop to Teradata.
<use_foreign_server>
       <name>foreign_server</name>
</use_foreign_server>            
You must specify a foreign server in order for Data Mover to use T2H to move data from Hadoop to Teradata 15.0 or later. If you do not specify a foreign server, Data Mover uses TDCH instead of T2H.

XML File Example

For the create command, type datamove create -f parameters.xml.

The following example shows a parameters file for the create command with job definition for Teradata to Hadoop.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
 <dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/unity/datamover.xsd">
 <!-- source td credentials -->
     <source_tdpid>dm-agent5</source_tdpid>
     <source_user>ak186010_portlet</source_user>
     <source_password>qwerty</source_password>
     <!-- target hadoop credentials -->
     <target_hadoop_system>
         <webHCat_url>http://webhcat:50111</webHCat_url>
         <oozie_url>http://oozie:11000</oozie_url>
         <file_system_url>http://webhdfs:50070</file_system_url>
         <hive_user>hive</hive_user>
         <hive_password>hive</hive_password>
     </target_hadoop_system>
     <overwrite_existing_objects>true</overwrite_existing_objects>
     <freeze_job_steps>false</freeze_job_steps>
     <target_database><name>ak186010_all</name></target_database>
     <compare_ddl>false</compare_ddl>
     <!-- hadoop specific job options -->
         <hadoop_options>
         <file_options>ORC</file_options>
         <file_options_delimiter>#</file_options_delimiter>
         <transfer_method>hash</transfer_method>
         <number_mappers>10</number_mappers>
     </hadoop_options>
 <!-- object selection. in this case, we are moving ITTest1 from ak186010_portlet
 (td) to default db in hive (hadoop)-->
     <database selection="unselected">
         <name>ak186010_portlet</name>
         <table selection="included">
             <name>Table1</name>
             <target_name>NewTable1</target_name>
             <table_columns>
                 <column>
                     <name>column1</name>
                     <target_name>targetColumn1</target_name>
                     <type>varchar</type>
                     <target_type>string</target_type>
                 </column>
             </table_columns>
         </table>
     </database>
 </dmCreate>            
The following example shows a parameters file for the create command with job definition for Hadoop to Teradata.
<dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/unity/datamover.xsd">
    <source_hadoop_system>
        <webHCat_url>http://webhcat:50111</webHCat_url>
        <oozie_url>http://oozie:11000</oozie_url>
        <file_system_url>http://webhdfs:50070</file_system_url>
        <hive_user>ssuser</hive_user>
        <hive_password>sspass</hive_password>
    </source_hadoop_system>
    <target_tdpid>dm-agent3</target_tdpid>
    <target_user>"yye"</target_user>
    <target_password>yye</target_password>
    <overwrite_existing_objects>true</overwrite_existing_objects>
    <force_utility>sqlh</force_utility>
    <freeze_job_steps>false</freeze_job_steps>
    <compare_ddl>false</compare_ddl>   
    <log_level>0</log_level>
    <online_archive>UNSPECIFIED</online_archive>
    <hadoop_options>
        <file_options>text</file_options>
        <file_options_delimiter>|</file_options_delimiter>
        <transfer_method>batch_insert</transfer_method>
        <transfer_batch_insert_size>10000</transfer_batch_insert_size>
        <number_mappers>5</number_mappers>
    </hadoop_options>
    
    <database selection="unselected">
        <name>yye</name>
        <table selection="included">
            <name>test1</name>
            <table_columns>
                <column>
                    <name>c1</name>
                    <target_name>t1</target_name>
                    <type>string</type>
                    <target_type>varchar</target_type>
                    <size>8</size>
                    <charset>UTF8</charset>
                    <allowNull>true</allowNull>
                    <allowDuplicate>false</allowDuplicate>
                    <isPrimaryIndex>true</isPrimaryIndex>
                </column>
            </table_columns>
        </table>
    </database>
</dmCreate>