Hive Connector and Link Properties | QueryGrid - Hive Connector and Link Properties - Teradata QueryGrid

QueryGrid™ Installation and User Guide - 3.06

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Teradata QueryGrid
Release Number
3.06
Published
December 2024
ft:locale
en-US
ft:lastEdition
2024-12-07
dita:mapPath
ndp1726122159943.ditamap
dita:ditavalPath
ft:empty
dita:id
lxg1591800469257
Product Category
Analytical Ecosystem
When you create links and associated properties in the QueryGrid portlet, you are creating Configuration Name Value Pairs (NVP). NVP does the following:
  • Specifies the behavior of the target connector component
  • Configures how data is transformed
  • Configures the underlying link data transportation layer
  • Affects how the initiator connector performs

Links are named configurations that include an initiating connector and a target connector. If the same property is set for a link and a connector, the link setting overrides the connector setting.

Properties may be available for initiating connectors only, target connectors only, or both.

Basic Tab

Hive Connector
Name Default Description Overridable?

Property Name

Connector Type
Server None Used to connect to the target database as part of the JDBC connection string. This is the IP address or DNS name of the target host.   Target
Port 10000 Valid values for Hiveserver 2 are 1026–65535.   Target
Database Name Default Name of the database for the connector, if not provided in the user query.

Maximum name length is 255 characters.

  Target
Conf File Paths /etc/hadoop/conf/,

/etc/hive/conf/

Paths to core-site.xml and hive-site.xml in a comma-separated list.

If both Hiveserver2 hive-site.xml and LLAP hive-site.xml exist, LLAP hive-site.xml has a higher a priority and is used when both files exist but contain different values.

This is a required setting.

  Target

Security Tab

Hive Connector
Name Default Description Overridable?

Property Name

Connector Type
Authentication Mechanism None Authentication mechanism used on the target data source.

Valid values are None, Kerberos, Trusted Kerberos, Kerberos SSO, and HS2Only.

This is a required setting.

For HDInsight clusters when using the Enterprise Security Package (ESP), select Kerberos.

  Target
Username Hive

Name of the user.

Maximum length is 255 characters. A username added for a connector or target connector link must be included in Allowed OS users.

This NVP is saved in the QueryGrid Manager configuration and is required when the initiator does not support a mechanism to provide user credentials. The username is also used for connectivity diagnostic checks.

  Target
Password None User password.

Maximum length is 255 characters.

Used only if Kerberos, Trusted Kerberos, or HS2Only security is used.

  Target
Keytab None Absolute path to the Kerberos keytab file. QueryGrid only uses the keytab file for authentication if the user does not provide a username and password.   Target
Hive Kerberos Principal None The principal for HiveServer2 in the format of x/y@z. Overrides the principal that is read from the hive-site.xml for HiveServer2.

Required when using Kerberos on CDH or CDP.

  Target
Role Support False Enable user role support for the target system.

When set to true, the user role from the initiator or target is applied to the target system based on role mapping, if any. If the cluster is configured to support roles, errors are returned back to the user. If the cluster is not configured to support roles, the user role is not applied.

When set to false, roles are not applied to the target system.

roleSupport

Target
Knox Gateway Host None Knox gateway host. The use of this property indicates that Knox is being used.   Target
Knox Gateway Port 8443 Knox gateway port number.

Valid values are 1024–65535.

Required when only using Knox.

  Target
Knox Context Path gateway/default/hive Knox context path for HS2, for example, gateway/mycluster/hive

Required when only using Knox.

  Target
Knox Trust Store Path None Knox gateway trust store path.

Required when only using Knox.

  Target
Knox Trust Store Password None Knox gateway trust store password.

Required when only using Knox.

  Target
Knox Connection Username None Username for the Knox connection.

Required when only using Knox.

  Target
Knox Connection Password None Password for the Knox connection.

Required when only using Knox.

  Target
SSL TrustStore Path None Specifies the truststore file path when connecting to a hiveserver2 with SSL enabled.
  • If this SSL truststore path is not set and Java does not have a truststore path set, the hive connector attempts to use the keystore path and password from hive-site.xml.
  • SSL on CDH or CDP requires that either this NVP is set or the truststore information is set in Java.
  Target
SSL TrustStore Password None Specifies the SSL truststore password when SSL is enabled for hiveserver2.   Target

Query Engine Tab

Hive Connector
Name Default Description Overridable?

Property Name

Connector Type
Hive Execution Engine mr Hive execution engine to use.

Possible values are mr, tez, or spark. Not all values are supported on all Hive implementations.

hiveEngine

Initiator, Target
Number of Mappers 3 * number of data nodes The number of mappers (equivalent to containers) on the cluster. It defines the maximum parallelism per cluster per query.

This is applicable when exporting data to Hive using a Teradata-to-Hive export, or importing data to Hive using a Hive-to-Teradata import.

The following are intended to be conservative starting values:
  • The number of data nodes in the Hadoop cluster multiplied by 3.
Consider increasing these numbers significantly based on the available resources on the Hadoop cluster as well as the nature of the queries you are performing. For Hive-on-Spark, the system default is used if a value is not specified.

numMappers

Initiator, Target
Queue Name None Name of the queue that submits the MR, Tez, or Spark job.

queueName

Initiator, Target
HiveServer2 HA Enabled False Indicates whether to use Zookeeper HA options as part of the JDBC connect string. Set to true if connecting to a Zookeeper server for load balancing.

isHS2HAEnabled

Target
Hadoop Library Path Default Hadoop library path Required if Hadoop uses a custom installation path instead of the default Hadoop path or if any Hadoop .jar files are saved outside of the default Hadoop library. Enter paths in a comma-separated list. See Configuring the Hive Connector for Use with a Custom Hadoop Library Path or Custom JAR Path.

If no custom information is available, the default Hadoop library path is used.

  Target
Custom JAR Path None Specifies the path or paths to use for .jar files not listed in Hadoop JAR Files. Enter paths in a comma-separated list. See Configuring the Hive Connector for Use with a Custom Hadoop Library Path or Custom JAR Path.   Target
Hadoop Properties None Specifies Hadoop environment properties for a user session. Properties are provided in a list. Use = between each property and its value (name=value, name=value, name=value), and a comma as a separator between properties, with or without a space after the comma.

For example:

mapred.job.queue.name=abcdef,mapreduce.task.timeout=3600000,mapreduce.map.speculative=false

If Hadoop Properties is not selected, the default Hadoop environment properties are used.

hadoopProperties

Target
Compression Codec System Default Compression type to use when exporting to a Hive table. Valid values are System Default, Deflate, BZip2, Gzip, LZ4, and Snappy.

compressionCodec

Target
Number of Cores per Executor None For Hive-on-Spark only. Controls the number of concurrent tasks an Executor can run. If a value is not specified, the system default is used.

numExecutorCores

Initiator, Target

Advanced Tab

Hive Connector
Name Default Description Overridable?

Property Name

Connector Type
Temporary Database Name Default Temporary database name for storing temporary tables and views.

tempDbName

Target
Enable Logging INFO Logging level for the connector or link properties. User level log settings can be explicitly set through the add or edit link page in the QueryGrid portlet.

This setting applies to both the initiating and target connector; however, the logging level for the initiating connector in the link takes precedence if the connectors are set differently.

Valid values are NONE, WARN, INFO, and DEBUG.

  Initiator, Target
Enable Query Logging True

When set to true, QueryGrid logs query text in your local drive. When set to false, query text is not logged. Selecting false prevents sensitive customer data from potentially being saved outside the database for compliant environments, such as Teradata VantageCloud Lake.

  Target
16.20+ LOB Support True On Teradata systems version 16.20 and later, the STRING and BINARY columns on Hive are mapped to CLOB and BLOB by default. Deselect this option to map the STRING and BINARY columns to VARCHAR and VARBYTE, respectively.

Disable this option if there are a large number of STRING/BINARY columns in the Hive table.

lobSupport

Target
Default String Size 32000 characters The VARCHAR truncation size.

This is the size at which data imported from or exported to string columns is truncated. The value represents the maximum number of Unicode characters to import, and defaults to 32000 characters. QueryGrid truncates the string columns at the default value set in defaultStringSize if less than the actual column size.

Valid values are 1–1048544000 characters.

This is for a Teradata-to-Hive link and is used by the target Hive connector and is applicable when the initiating Teradata system does not support CLOB data types with QueryGrid. With CLOB support, the default string size is not used.

defaultStringSize

Target
Default Binary Size 64000 bytes The default truncation size for the VARBINARY types.

Valid values are 1–2097088000 bytes.

This is for a Teradata-to-Hive link and is used by the target Hive connector and is applicable when the initiating Teradata system does not support BLOB data types with QueryGrid. With BLOB support, the default binary size is not used.

defaultBinarySize

Target
Disable Pushdown False When set to true, disables the pushdown of all query conditions to the target system.

Certain system-level, session-level, and column-level query attributes, such as CASESPECIFIC, can affect character string comparison results. These attributes can cause some queries to return incorrect results due to incorrect row filtering on the target system.

To avoid incorrect results caused by condition pushdown in situations where the settings on the initiating system do not match the settings on the target system, you can disable the pushdown of all conditions to the target system.

If designated as Overridable, this property can only be overridden at the session level from false to true (indicating you are disabling pushdown), but cannot be changed from true to false.

disablePushdown

Initiator
Collect Approximate Activity Count False Displays the approximate number of rows exported to the target data source.

When set to false, the activity count displays a 1. When set to true, an approximate activity count is returned. The default is false.

collectActivityCount

Target
Support Hive Task Retries False When set to true, the fabric is enabled to support Hive Task Retries. When enabled, the fabric does not fail on the query when Hive tasks are retried. Instead, the fabric discards the data received from a failed task attempt and continues to process task data from the retried attempt.
This feature is only supported when Hive is the data source, where Hive is importing data as a target or exporting data as an initiator, and the execution engine is mr or tez.
  Initiator, Target
Spool File Path /var/opt/teradata/tdqg/fabric/data/ Local path to use for spool files if the fabric is enabled to support Hive Task Retries.
When a custom path is specified, the specified directory must be present on all Hadoop data nodes configured in a QueryGrid cluster. The directory must have read, write, and execute permissions to create, read, and write files and sub directories for the querygrid OS group and the tdqg OS user.
  Initiator, Target
Link Buffer Count 4 Maximum number of write buffers available on a single channel at one time.
Link Buffer Count overrides the default internal fabric property shmDefaultNumMemoryBuffers.

Valid values are 4–8.

linkBufferCount

Initiator, Target
Link Buffer Size 1048576 Maximum size of the write buffers to allocate for row handling and message exchange.

Valid values are 73728–10485760 bytes.

linkBufferSize

Initiator
Response Timeout 86400000 The number of milliseconds to wait for the target query to complete before timing out and stopping the operation. The fabric stops and releases all resources associated with queries whose duration exceeds the value set on the target link properties or the target connector properties. Connectors timeout when responses from the fabric exceed their response timeout value.

Valid values are 300000–172800000.

responseTimeout

Initiator, Target
Connection Max Idle Time 86400 seconds The maximum idle time for the connection cache object, after which the object is closed and removed from the cache. Use this property when there are multiple concurrent users and queries running on the system that might lead to starvation of connection objects.

Valid values are 1–86400 seconds.

  Target
Connection Pool Size 100 Maximum number of connection objects that can be stored in a connection pool. When acquiring a new connection, the connector checks for an available space in the pool. If no space is available in the connection pool, the connection fails after 5 minutes. Only one connection pool and username per connector configuration is allowed.

Valid values are 1–10000.

  Target
Connection Evict Frequency 30 minutes Frequency of eviction checks. Connection objects from the pool are checked, closed, and removed if the idle time (current time - last time of use) of a connection object is greater than the Connection Max Idle Time setting.

Reduce the time between checks if there are multiple concurrent users running queries to clear the connections more frequently.

Valid values are 1–1440 minutes.

  Target