- Specifies the behavior of the target connector component
- Configures how data is transformed
- Configures the underlying link data transportation layer
- Affects how the initiator connector performs
Links are named configurations that include an initiating connector and a target connector. If the same property is set for a link and a connector, the link setting overrides the connector setting.
Basic Tab
Spark Connector | ||||
---|---|---|---|---|
Name | Default | Description | Overridable? Property Name |
Connector Type |
Server | None | Used to connect to the target database as part of the JDBC connection string. This is the IP address or DNS name of the target host. | Target | |
Port | 10016 | Valid values for the Spark Connector are 1026–65535. | Target | |
Database Name | Default | Name of the database for the connector, if not provided in the user query. Maximum name length is 255 characters. |
Target | |
Spark Execution Mechanism | Spark Thrift Server | Mechanism used by the target connector to submit queries to Spark. Possible values are Spark Thrift Server and Spark Application. Spark Thrift Server is not supported by CDH or CDP. |
Target | |
Spark Home Path | /usr/hdp/current/spark2-client/ | Filepath to the Spark home directory where the /jars sub-directory resides containing all the Spark library .jar files. | Target | |
Conf File Paths | /etc/hadoop/conf/, /etc/spark2/conf/ |
Paths to core-site.xml, hdfs-site.xml, and hive-site.xml (if available) in a comma-separated list. | Target |
Security Tab
Spark Connector | ||||
---|---|---|---|---|
Name | Default | Description | Overridable? Property Name |
Connector Type |
Authentication Mechanism | None | Overall security mechanism for the cluster. For HDInsight clusters when using the Enterprise Security Package (ESP), select Kerberos. |
Target | |
Username | Hive | Name of the user. Maximum length is 255 characters. A username added for a connector or target connector link must be included in Allowed OS users. This NVP is saved in the QueryGrid Manager configuration and is required when the initiator does not support a mechanism to provide user credentials. The username is also used for connectivity diagnostic checks. |
Target | |
Password | None | Password of the user or service account. | Target | |
Keytab | None | Absolute path to the Kerberos keytab file. QueryGrid only uses the keytab file for authentication if a username and password is not provided. | Target | |
SSL TrustStore Path | None | SSL truststore or keystore path for authentication on the Spark Thrift Server when SSL is enabled. Not needed if the keys are stored in the Java truststore. |
Target | |
SSL TrustStore Password | None | SSL truststore or keystore password for authentication on the Spark Thrift Server when SSL is enabled. Not needed if the keys are stored in the Java truststore. |
Target |
Query Engine Tab
Spark Connector | ||||
---|---|---|---|---|
Name | Default | Description | Overridable? Property Name |
Connector Type |
Number Executors | None | Unit of parallelism when data is exported or imported into Spark SQL. | ● numExecutors |
Initiator, Target |
Queue Name | None | Name of the queue that submits Spark jobs. Spark application mode only. | ● queueName |
Target |
Hadoop Properties | None | Specifies Hadoop environment properties for a user session. Properties are provided in a list. Use = between each property and its value (name=value, name=value, name=value), and a comma as a separator between properties, with or without a space after the comma. For example: mapred.job.queue.name=abcdef,mapreduce.task.timeout=3600000,mapreduce.map.speculative=falseIf Hadoop Properties is not selected, the default Hadoop environment properties are used. |
● hadoopProperties |
Target |
Compression Codec | System Default | Compression type to use when exporting to a Spark target table. Valid values are System Default, Deflate, BZip2, GZip, LZ4, and Snappy. | ● compressionCodec |
Target |
Spark Additional JAR Paths | None | Specifies the directory path or paths where any required .jar files are present. Recommended only when the Spark home directory does not contain a required Spark library .jar file. | Target | |
Spark Custom JARs | None | Specifies the path or paths of the Spark library .jar files. Recommended only when a new Spark library .jar is required. | Target |
Advanced Tab
Spark Connector | ||||
---|---|---|---|---|
Name | Default | Description | Overridable? Property Name |
Connector Type |
Temporary Database Name | Default | Temporary database name for storing temporary tables and views. | ● tempDbName |
Target |
Enable Logging | INFO | Runs queries with debugging mode enabled. Valid values are NONE, WARN, INFO, and DEBUG. |
Initiator, Target | |
Enable Query Logging | True | When set to true, QueryGrid logs query text in your local drive. When set to false, query text is not logged. Selecting false prevents sensitive customer data from potentially being saved outside the database for compliant environments, such as Teradata VantageCloud Lake. |
Target | |
Disable Pushdown | False | When set to true, disables the pushdown of all query conditions to the target system. Certain system-level, session-level, and column-level query attributes, such as CASESPECIFIC, can affect character string comparison results. These attributes can cause some queries to return incorrect results due to incorrect row filtering on the target system. To avoid incorrect results caused by condition pushdown in situations where the settings on the initiating system do not match the settings on the target system, you can disable the pushdown of all conditions to the target system. If designated as Overridable, this property can only be overridden at the session level from false to true (indicating you are disabling pushdown), but cannot be changed from true to false. |
● disablePushdown |
Initiator |
16.20+ LOB Support | True | On Teradata systems version 16.20 and later, the STRING and BINARY columns on Spark SQL are mapped to CLOB and BLOB by default. Deselect this option to map the STRING and BINARY columns to VARCHAR and VARBYTE, respectively. Disable this option if there are a large number of STRING/BINARY columns in the Spark table. |
● lobSupport |
Target |
Default String Size | 32000 characters | The VARCHAR truncation size. This is the size at which data imported from or exported to string columns is truncated. The value represents the maximum number of Unicode characters to import, and defaults to 32000 characters. QueryGrid truncates the string columns at the default value set in defaultStringSize. Valid values are 1–1048544000 characters. This is for a Teradata-to-Spark link and is used by the target Spark connector and is applicable when the initiating Teradata system does not support CLOB data types with QueryGrid. With CLOB support, the default string size is not used. |
● defaultStringSize |
Target |
Default Binary Size | 64000 bytes | The default truncation size for the VARBINARY types. Valid values are 1–2097088000 bytes. This is for a Teradata-to-Spark link and is used by the target Spark connector and is applicable when the initiating Teradata system does not support BLOB data types with QueryGrid. With BLOB support, the default binary size is not used. |
● defaultBinarySize |
Target |
Collect Approximate Activity Count | False | Displays the approximate number of rows exported to the target data source. When set to false, the activity count displays a 1. When set to true, an approximate activity count is returned. The default is false. |
● collectActivityCount |
Target |
Link Buffer Count | 4 | Maximum number of write buffers available on a single channel at one time. Link Buffer Count overrides the default internal fabric property shmDefaultNumMemoryBuffers.
Valid values are 2–16. |
● linkBufferCount |
Initiator, Target |
Link Buffer Size | 1048576 | Maximum size of the write buffers to allocate for row handling and message exchange. Valid values are 73728–10485760 bytes. |
● linkBufferSize |
Initiator, Target |
Response Timeout | 86400000 | The number of milliseconds to wait for the target query to complete before timing out and stopping the operation. The fabric stops and releases all resources associated with queries whose duration exceeds the value set on the target link properties or the target connector properties. Connectors timeout when responses from the fabric exceed their response timeout value. Valid values are 300000–172800000. |
● responseTimeout |
Initiator, Target |
Connection Max Idle Time | 86400 seconds | The maximum idle time for the connection cache object, after which the object is closed and removed from the cache. Use this property when there are multiple concurrent users and queries running on the system that might lead to starvation of connection objects. Valid values are 1–86400 seconds. |
Target | |
Connection Pool Size | 100 | Maximum number of connection objects that can be stored in a connection pool. When acquiring a new connection, the connector checks for an available space in the pool. If no space is available in the connection pool, the connection fails after 5 minutes. Only one connection pool and username per connector configuration is allowed. Valid values are 1–10000. |
Target | |
Connection Evict Frequency | 30 minutes | Frequency of eviction checks. Connection objects from the pool are checked, closed, and removed if the idle time (current time - last time of use) of a connection object is greater than the Connection Max Idle Time setting. Reduce the time between checks if there are multiple concurrent users running queries to clear the connections more frequently. Valid values are 1–1440 minutes. |
Target |