create (Teradata Open Distribution for Hadoop) | Commands | Teradata Data Mover - create (Teradata Open Distribution for Hadoop)

Purpose

The create command creates a job on the daemon using the syntax parameters and object list. A job definition consists of the parameters and object list.

Data Mover supports moving tables from Teradata Database to Hadoop and hive tables from Hadoop to Teradata Database.

Syntax

See http://schemas.teradata.com/unity/DataMover.xsd.

Parameters - Hadoop Specific

The following parameters are specific to Hadoop. Some of these parameters take a different form depending on the mechanism used to pass them.

See Parameter Order.

hadoop_options

These parameters are applicable only for Teradata to Hadoop. When passing this information using an .xml file, nest the options below the parameter.

Option	Description
file_options	Values supported by the Teradata Connector for Hadoop and SQL-H: Text (Default) RC ORC
file_options_delimiter	[Optional] Specifies a character delimiter for columns. If not specified, comma (',') is the delimiter. This option applies only if file_options is specified as Text.
number_mappers	[Optional] Specifies the number of mappers Teradata Connector uses to pull data from Teradata Database.

The parameters for passing equivalent information when using the CLI are as follows:

hadoop_file_option
hadoop_file_delimiter
hadoop_number_mappers

source_hadoop_system | target_hadoop_system

Defines the source or target sites and credential information for Hadoop data. These parameters are common to both Teradata to Hadoop and Hadoop to Teradata. When passing them using an .xml file, nest the options below the parameter.

Option	Value	Description
webhcat_url	http://webhcat:50111	Retrieves metadata such as databases, tables, column, and so on. To construct the URL, replace webhcat with the system name where the WebHCAT server resides. Port 50111 is the default for WebHCAT.1
oozie_url	http://oozie:11000	Runs hive queries and Teradata Connector for Hadoop (TDCH) jobs for data movement. To construct the URL, replace oozie with the system name where the Oozie server resides. Port 11000 is the default for Oozie.1
file_system_url	http://webhdfs:50070 or http://httpfs:14000	Retrieves configuration file stored in HDFS to run TDCH jobs and logs generated by TDCH jobs. Specify either WebHDFS REST URL or HttpFS REST URL. Default port for WebHDFS is 50070. Default port for HttpFS is 14000.1
logon_mechanism	default or kerberos	[Optional] The security protocol for logging in to the source or target Hadoop File System.
hive_user	hive or hive_user@hadoop_realm	Name of the user who has access to the tables in the Hadoop File System. When the logon mechanism is kerberos, value for the hive user must adhere to the following the convention: kerberos_principal@kerberos_realm
hive_password	hive_user_password	Password of the user who has access to the tables in the source or target Hadoop File System.2
hive_password_encrypted	hive_user_password_encrypted	Encrypted password of the user who has access to the tables in the source or target Hadoop File System. Not a valid parameter if hive_password is specified.2
1The value must start with http:// followed by system name or IP address and port number. If the logon mechanism is kerberos, the host name must be the fully qualified domain name. 2If the logon mechanism is default, this parameter is optional. If the logon mechanism is kerberos, this parameter is required and must be the password for the hive_user.

The parameters for passing equivalent information when using the CLI are as follows:

source_hadoop_webhcat_url | target_hadoop_webhcat_url
source_hadoop_oozie_url | target_hadoop_oozie_url
source_hadoop_file_system_url | target_hadoop_file_system_url
source_hive_logon_mechanism | target_hive_logon mechanism
source_hive_user | target_hive_user
source_hive_password_encrypted | target_hive_password encrypted

Parameters - Common

These parameters are common to both Teradata to Hadoop and Hadoop to Teradata.

See Parameter Order.

compare_ddl

[Optional] Designates whether to compare DDL for the source and target objects specified.

Valid Values

unspecified (default) the daemon.default.compareDDL.enabled parameter value from the Data Mover daemon configuration file is used. If the parameter is specified as true or false, that value overrides the parameter value in the Data Mover daemon configuration file.
true - the job steps are not recreated each time the job is started and DDL is compared on source and target.
false - the job steps are recreated each time the job is started and DDL is not compared on source and target.

db_client_encryption

[Optional] Set to true if job needs to be encrypted during data transfer.

dm.rest.endpoint

[Optional] Enter a Data Mover REST server URL to overwrite the default value specified in the commandline.properties file in order to connect to a different REST server (and therefore a different daemon) at runtime.

https://dm-server1:1443/datamover

freeze_job_steps

[Optional] Freezes the job steps so that they are not recreated each time the job is started. Should only be set to true if the source and target environments do not change after the job is created.

Valid Values

true - the job steps are not recreated each time the job is started
false - the job steps are recreated each time the job is started
unspecified (default) - the value is set to false

Example: true

job_name

[Optional] Name for this job. The name must be unique and up to 32 characters.

job_priority

[Optional] Specifies the execution priority for the job. Supported values are: HIGH, MEDIUM, and LOW, and UNSPECIFIED. If no value is specified, the default of MEDIUM is used at runtime.

Example: MEDIUM

log_to_event_table

[Optional] Specifies the event table to be used for this job. For more information, see Using Event Tables.

overwrite_existing_objects

[Optional] Job overwrites objects that already exist on the target.

Valid Values

Value	Description
True	Enables overwriting
False	Disables overwriting
Unspecified	Default – the value is set to the value in the Data Mover daemon configuration file

If the parameter is not specified, the value is set to the overwrite_existing_objects parameter value in the Data Mover daemon configuration file. If the parameter is specified as true or false, that value takes precedence over the parameter value in the Data Mover daemon configuration file.

Example: true

response_timeout

[Optional] Amount of time, in seconds, to wait for response from the Data Mover daemon.

Example: 60

source_password | target_password

[Optional] Source or target Teradata logon password. Not a valid parameter if source_password_encrypted | target_password_encrypted is specified. If you do not specify a password for this parameter, the command prompts you to enter it interactively. Input is masked with a set number of asterisks, regardless of the length of the password.

source_password_encrypted | target_password_encrypted

[Optional] Source or target Teradata encrypted logon password. Not a valid parameter if source_password | target_password is specified.

source_tdpid | target_tdpid

Source or target Teradata Database.

source_user | target_user

[Optional] Source or target Teradata logon id. If you do not specify a logon id for this parameter, the command prompts you to enter it interactively.

source_userid_pool | target_userid_pool

[Optional] Job pulls the user from the specified credential pool. Only available for Teradata systems.

Option	Description
source_userid_pool	Pulls user from the Teradata source system.
target_userid_pool	Pulls user from the Teradata target system.

Example: POOL-1

table

Option		Description
name		Name of the table to be moved.
target_name		[Optional] Renames the table. If no value is specified, the original table name is used.
target_database		[Optional] Moves the tables to a different database. If no value is specified, the original database is used as the target.

Parameters - Teradata to Hadoop

These parameters are used only for Teradata to Hadoop.

See Parameter Order.

force_utility

[Optional] Forces the Data Mover daemon to use a specific utility for a Teradata to Hadoop job. The Data Mover daemon uses SQL-H to move the table. If SQL-H cannot, Teradata Connector for Hadoop is used to move the table.

Valid Values

Default (Data Mover determines if SQL-H can be used to move the table. If not, it uses the Teradata Connector for Hadoop.)
Hadoop_Connector
SQLH

table_columns

[Optional] Customizes the table columns before moving data to Hadoop. If the target table does not exist, you can rename the columns on the target side. If a target table already exists, the column name and type should match the column that exists on target. The table column properties create the target table on Hadoop or validate the values of an existing table.

Option	Description
name	Name of the column to be modified
target_name	[Optional] Renames the column. If no value is specified, the original column name is used as the target name.
target_type	[Optional] Changes the default column type. If no value is specified, the default mapping is used. For more information about the default and available mappings, see Data Type Mappings.
type	Table column type. Informational only; cannot be modified by the user.
size	[Optional] Changes the size of the column type, applicable only for CHAR type. If no value is specified, the default value [255] is used.

transfer_method

[Optional] Teradata Connector for Hadoop that supports these options for data transfer from Teradata to Hadoop.

Option	Description
Default	Allows Data Mover to select AMPs if the source Teradata Database is 14.10 or later.
Hash	The underlying Hadoop connector retrieves rows in a given hash value range of a specified split-by column from a source table in Teradata, and writes those records into a target file in HDFS.
Value	The underlying Hadoop connector retrieve rows in a given value range of a specified split-by column from a source table in Teradata, and writes those records into a target file in HDFS.
Partition	The underlying hadoop connector creates a staging PPI table on source database if the source table is not a PPI table.
Amp	The underlying Hadoop connector retrieves rows in one or more AMPs from a source table in Teradata, and writes those records into a target file in HDFS. The Amp option is supported only if the Teradata Database is 14.10 or later.

use_foreign_server

For T2H utility, you can specify a foreign server that was previously created to move data from Hadoop to Teradata. This foreign server is used to move all tables in a particular job.

The following example shows use of the use_foreign_server command with job definition for a Teradata to Hadoop job:

<use_foreign_server> 
        <name>foreign_server</name> 
</use_foreign_server>

You must specify a foreign server in order for Data Mover to use T2H to move data from Teradata 15.0 to Hadoop . If you do not specify a foreign server and force utility is not T2H, Data Mover uses TDCH instead of T2H.

Parameters - Hadoop to Teradata

These parameters are used only for Hadoop to Teradata. Some of these parameters take a different form depending on the mechanism used to pass them.

See Parameter Order.

force_utility

[Optional] Forces the Data Mover daemon to use a specific utility for Hadoop copy operations. The Data Mover daemon uses SQL-H to move the table. If SQL-H cannot, Teradata Connector for Hadoop is used to move the table.

Valid Values

Default (Data Mover determines if SQL-H can be used to move the table. If not, the Teradata Connector for Hadoop) is used.
Hadoop_Connector
SQLH

size

If the target table does not exist, ensure the sum of all CHAR/VARCHAR columns is not greater than 64000; otherwise, the job will fail at runtime due to a JDBC limitation. For columns with the UNICODE session character set, consider the value double what is specified in size while calculating the sum of all columns' size. For example, if the column of the character set is UNICODE and the size is 2000, consider the size 4000.

table_columns

[Optional] Customizes the table columns before moving data to Teradata Database. For more information about data types, see Data Type Mappings. If the target table does not exist, you can rename the columns on the target side. If a target table already exists, the column name and type should match the column that exists on target. The table column properties create the target table on Teradata Database or validate the values of an existing table.

Option		Description
name		Name of the column to be modified
size		[Optional] Limits the size of columns such as VARCHAR
target_name		[Optional] Renames the column. If no value is specified, the original column name is used as the target name.
target_type		[Optional] Changes the default column type. If no value is specified, the default mapping is used. For more information about the default and available mappings, see Data Type Mappings.
type		Table column type. Informational only; cannot be modified by the user.
allowDuplicate		[Optional] Specifies whether duplicate values are allowed in the target table column (true of false). Must not be specified or must be set to false if isPrimaryIndex is true, or an error occurs.
allowNull		[Optional] Specifies if null values are allowed in the target table column (true or false). Must not be specified or must be set to false if isPrimaryIndex is true, or an error occurs.
isPrimaryIndex		[Optional] Specifies whether null values are allowed in the target table column (true or false). If isPrimaryIndex is false and allowNull is true, allowDuplicate must not be specified or must be false, or an error occurs.

transfer_method 1

[Optional] The options that Teradata Connector for Hadoop supports for transferring data from Hadoop to Teradata.

Option	Description
Default	Data Mover selects internal_fastload.
batch_insert	The underlying Hadoop connector insert rows into a NOPI staging table through a JDBC batch execution. After all mappers complete their processing, rows in the staging table are moved to the target table by an Insert-Select SQL operation.
transfer_batch_insert_size 2	If you specify batch_insert as the transfer method, you can also specify this parameter as a value representing the number of rows (for example, 10000, 50000). This property is not applicable when you specify internal_fastload as the transfer method.
internal_fastload	The underlying Hadoop connector starts a database FastLoad session to load rows into a single NOPI staging table. All database FastLoad sessions are coordinated through an internal protocol. The FastLoad job finishes after all mappers complete their execution, and then rows in the NOPI staging table are copied to the target table by an Insert-Select SQL operation.
1In the CLI, this parameter takes the form hadoop_transfer_method. 2In the CLI, this parameter takes the form hadoop_transfer_batch_size.

use_foreign_server

In Teradata Database 15.0 or later, you can specify a foreign server that was previously created to move data from Hadoop to Teradata. This foreign server is used to move all tables in a particular job.

The following example shows use of the use_foreign_server command with job definition for Hadoop to Teradata.

<use_foreign_server>
       <name>foreign_server</name>
</use_foreign_server>

You must specify a foreign server in order for Data Mover to use T2H to move data from Hadoop to Teradata 15.0 or later. If you do not specify a foreign server, Data Mover uses TDCH instead of T2H.

XML File Example

For the create command, type datamove create -f parameters.xml.

The following example shows a parameters file for the create command with job definition for Teradata to Hadoop.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
 <dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/unity/datamover.xsd">
 <!-- source td credentials -->
     <source_tdpid>dm-agent5</source_tdpid>
     <source_user>ak186010_portlet</source_user>
     <source_password>qwerty</source_password>
     <!-- target hadoop credentials -->
     <target_hadoop_system>
         <webHCat_url>http://webhcat:50111</webHCat_url>
         <oozie_url>http://oozie:11000</oozie_url>
         <file_system_url>http://webhdfs:50070</file_system_url>
         <hive_user>hive</hive_user>
         <hive_password>hive</hive_password>
     </target_hadoop_system>
     <overwrite_existing_objects>true</overwrite_existing_objects>
     <freeze_job_steps>false</freeze_job_steps>
     <target_database><name>ak186010_all</name></target_database>
     <compare_ddl>false</compare_ddl>
     <!-- hadoop specific job options -->
         <hadoop_options>
         <file_options>ORC</file_options>
         <file_options_delimiter>#</file_options_delimiter>
         <transfer_method>hash</transfer_method>
         <number_mappers>10</number_mappers>
     </hadoop_options>
 <!-- object selection. in this case, we are moving ITTest1 from ak186010_portlet
 (td) to default db in hive (hadoop)-->
     <database selection="unselected">
         <name>ak186010_portlet</name>
         <table selection="included">
             <name>Table1</name>
             <target_name>NewTable1</target_name>
             <table_columns>
                 <column>
                     <name>column1</name>
                     <target_name>targetColumn1</target_name>
                     <type>varchar</type>
                     <target_type>string</target_type>
                 </column>
             </table_columns>
         </table>
     </database>
 </dmCreate>

The following example shows a parameters file for the create command with job definition for Hadoop to Teradata.

<dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/unity/datamover.xsd">
    <source_hadoop_system>
        <webHCat_url>http://webhcat:50111</webHCat_url>
        <oozie_url>http://oozie:11000</oozie_url>
        <file_system_url>http://webhdfs:50070</file_system_url>
        <hive_user>ssuser</hive_user>
        <hive_password>sspass</hive_password>
    </source_hadoop_system>
    <target_tdpid>dm-agent3</target_tdpid>
    <target_user>"yye"</target_user>
    <target_password>yye</target_password>
    <overwrite_existing_objects>true</overwrite_existing_objects>
    <force_utility>sqlh</force_utility>
    <freeze_job_steps>false</freeze_job_steps>
    <compare_ddl>false</compare_ddl>   
    <log_level>0</log_level>
    <online_archive>UNSPECIFIED</online_archive>
    <hadoop_options>
        <file_options>text</file_options>
        <file_options_delimiter>|</file_options_delimiter>
        <transfer_method>batch_insert</transfer_method>
        <transfer_batch_insert_size>10000</transfer_batch_insert_size>
        <number_mappers>5</number_mappers>
    </hadoop_options>
    
    <database selection="unselected">
        <name>yye</name>
        <table selection="included">
            <name>test1</name>
            <table_columns>
                <column>
                    <name>c1</name>
                    <target_name>t1</target_name>
                    <type>string</type>
                    <target_type>varchar</target_type>
                    <size>8</size>
                    <charset>UTF8</charset>
                    <allowNull>true</allowNull>
                    <allowDuplicate>false</allowDuplicate>
                    <isPrimaryIndex>true</isPrimaryIndex>
                </column>
            </table_columns>
        </table>
    </database>
</dmCreate>

create (Teradata Open Distribution for Hadoop) | Commands | Teradata Data Mover - create (Teradata Open Distribution for Hadoop) - Teradata Data Mover

Teradata® Data Mover User Guide

Purpose

Syntax

Parameters - Hadoop Specific

Parameters - Common

Parameters - Teradata to Hadoop

Parameters - Hadoop to Teradata

XML File Example