2.10 - Teradata QueryGrid Components - Teradata QueryGrid

Teradata® QueryGrid™ Installation and User Guide

prodname
Teradata QueryGrid
vrm_release
2.10
created_date
September 2019
category
Administration
Configuration
Installation
User Guide
featnum
B035-5991-099K

Teradata QueryGrid includes the following components.

Component Description
Teradata QueryGrid Manager Software installed on a dedicated physical machine (TMS or server) or VM that enables definition, administration, and monitoring of Teradata QueryGrid.

After installing Teradata QueryGrid Manager, configure it in Viewpoint, and then use the QueryGrid portlet to install and configure the remaining Teradata QueryGrid components.

Data Center Logical name that represents the physical location of systems (data sources) in Teradata QueryGrid.
System (Data Source) One or more data source nodes that share the same software platform, such as Teradata Database nodes, nodes in a Hadoop cluster, or nodes in a Presto cluster.
System (Bridge) A subset of data source nodes or non-data source nodes used to perform CPU-intensive operations such as compression and encryption, and to transfer data.
Fabric One or more data source nodes, in different systems, that run a compatible version of Teradata QueryGrid software over the same port.
Connector Adapter software for a data source that enables data type mapping, conversion, and communication with other connectors in the same Teradata QueryGrid fabric.
Link Named configuration that specifies which connectors can communicate with each other and defines rules of data transfer.

Teradata QueryGrid Manager

Teradata QueryGrid Manager performs the following functions in Teradata QueryGrid:
  • Administers and monitors Teradata QueryGrid
  • Allows installation, configuration, and upgrade of Teradata QueryGrid
  • Initiates connectivity and bandwidth diagnostics checks
  • Captures and summarizes query performance metrics
  • Manages keys for secure access to Teradata QueryGrid
  • Captures logs generated from Teradata QueryGrid components

The number of Teradata QueryGrid Manager instances required depends on the QueryGrid Managers hardware specifications, the number of data source nodes in the fabric, and the volume of QueryGrid queries. At least two QueryGrid Managers are recommended for high availability.

When more than one Teradata QueryGrid Manager instance is installed, cluster the Teradata QueryGrid Manager instances for high availability and scalability. Clustering makes sure that all Teradata QueryGrid Manager instances have the same configuration so they can each administer and monitor Teradata QueryGrid. Teradata recommends creating only one QueryGrid Manager cluster in your enterprise system to prevent silos where systems are not able to join the same QueryGrid fabric.

When a cluster is present, failover and recovery of a QueryGrid Manager instance in a cluster is automatic. If one instance goes offline, the other instances in the cluster take over the workload of the failed instance. Once the failed QueryGrid Manager comes back online, it automatically recovers any new configuration updates made while offline and resumes its workload as it was before the failure.

Data Center

A Data Center performs the following functions in Teradata QueryGrid:

  • Allows Teradata QueryGrid Manager to determine whether communication between two data sources (systems) is across a LAN or WAN
  • Makes sure communication with Teradata QueryGrid Manager remains LAN-local if a Teradata QueryGrid Manager is available locally

When you install Teradata QueryGrid Manager software on a physical machine, a default Data Center is created using the hostname of the TMS, VM, or server. The default Data Center name is displayed in the QueryGrid portlet.

The default Data Center cannot be deleted. It can be renamed, or you can do one of the following:
  • Create a new Data Center in the QueryGrid portlet
  • Create a new Data Center during clustering if you have two or more Teradata QueryGrid Manager instances.

System (Data Source)

During Teradata QueryGrid configuration, Teradata QueryGrid is added as a monitored system.

The following data source systems can be added to Teradata QueryGrid. Each system must have its own connector.
  • Teradata Database
  • Hive
  • Presto
  • Spark SQL
  • Oracle

Data source nodes added to systems in Teradata QueryGrid can be associated with only one system. During Teradata QueryGrid configuration, the node and fabric software is installed on every node in a system. The node software manages the fabrics and connectors on the node.

System (Bridge)

Bridge systems represent a subset of data source nodes or a separate set of non-data source nodes that allow:
  • All network traffic to flow through the nodes in the bridge system instead of through all the nodes in data source systems
  • Offloading CPU-intensive operations such as data encryption or compression to the nodes in the bridge system instead of using all the nodes in data source systems
  • Using nodes connected to a public network for data transfer
One or two bridge systems can be included in the data transfer path between data source nodes in initiator and target systems. The nodes in a bridge system can be:
  • A subset of data source nodes running QueryGrid node software. The data source nodes can be in the initiator system, or target system, or both.
  • A set of non-data source nodes running QueryGrid node software. The non-data source nodes do not need connector software because they do not have local data to process. An example of a non-data source node is an edge node in a Hadoop system.

A link is used to define the data transfer path. One or two bridges can be included in the data transfer path. A link can contain one or more hops. The number of hops is based on the number of bridges.

  • If there are no bridges, there is one hop.
  • If there is one bridge, there are two hops.
  • If there are two bridges, there are three hops.

The hop defines the network and communication policy to be used for data movement between connection points in the data transfer path.

Fabric

A fabric performs the following functions in Teradata QueryGrid:

  • Enables communication between paired data source nodes of the same type, such as Teradata Database and Teradata Database, or a different type, such as Teradata Database and Presto
  • Allows a user to initiate a single SQL query that joins data from two or more systems (data sources) within the fabric

There is no restriction on the size of the data that is transported between data source nodes in a fabric.

Fabric software is installed on data source nodes and does the following:
  • Allows the Teradata QueryGrid connectors to:
    • Communicate with each other in parallel
    • Run, process, and transport data efficiently
  • Monitors fabric usage per query and reports metrics to the Teradata QueryGrid Manager

Connectors and links are associated with a fabric.

Connectors

A connector performs the following functions in Teradata QueryGrid:
  • Provides for query processing across data sources (systems)
  • Translates query request SQL from one source query format to another
  • Transforms data, converting the data from one data type or format to another so that the data can be exchanged between different systems (data sources).
  • Enables data sources to participate in queries. Any connector that joins the fabric can participate in the queries.
  • Enables data sources to initiate queries.
  • Enables sending and receiving data to and from the fabric
  • Communicates with other connectors in the fabric
  • Controls the running of queries on target systems. All connectors can act as a target of queries and control the queries that are run on a target system (data source) on behalf of the initiator system (data source)
  • Return query results to initiating systems

Connectors are specific to the system type (Teradata Database, Hive, or Presto-configured Hadoop), and there can be only one connector for each system type. A Teradata Database system hosts a Teradata connector, but a single Hadoop system can host multiple connectors (Hive and Presto).

Optional connector properties allow you to refine a connector type configuration, or override connector properties set during configuration.

Connector software connects data sources with the Teradata QueryGrid fabric. The software is installed on all data source nodes in a system. Fabric software is also installed on all data source nodes in a system. Fabric software includes a driver. A driver runs on one or more data source nodes called driver nodes. As part of query processing, a driver node receives requests from the initiator connector and submits the requests to the target system. The driver loads the connector, reads messages, invokes methods to process requests, and sends responses.

When you configure a connector, you must specify which nodes in a system (data source) are driver nodes. Only a subset of the nodes on a large system need to be driver nodes. Use multiple driver nodes for redundancy and to share the workload required for query initiation.

After a target driver node submits a query, connection caching forms a connection pool that does the following:
  • Reuses physical connections
  • Reduces overhead for QueryGrid queries
  • Minimizes creating and closing sessions

The connection pool uses the same JDBC connection for all phases of a single query or for subsequent queries with the same session and user credentials. For more information on configuring tunable connector pool properties, see the connector and link properties information in the relevant connector topics.

Links

A connector can be both an initiator or a target of a query. A link is a named configuration that defines the initiator connector and target connector pair.
  • Initiating connector: Point from which a QueryGrid query originates. For example, in a Teradata-to-Presto query, the Teradata connector initiates the query.
  • Target connector: Destination point of a QueryGrid query. For example, in a Teradata-to-Presto query, a Presto connector is the target connector that the initiating connector accesses to either import or export data.
You can create initiating connector and target connector link pairings only for connector types you have defined.
  • If you have defined only Teradata Database connectors, you can create links only between Teradata Database systems.
  • If you have defined both Teradata and Presto connectors, you can create links between Teradata Database systems and Presto systems.
  • If you have defined Teradata, Presto and Hive connectors, you can create links for all combinations of these connector types.

Links simplify configuration of foreign server definitions in preparation for running QueryGrid queries.

In a Teradata QueryGrid fabric, each link:
  • Specifies an initiating connector and a target connector link pairing to use for querying
  • Specifies whether bridges are used:
    • If no bridges are used, only one hop is defined (the hop between the initiator and target)
    • If one bridge is used, two hops are defined (the hop between the initiator and bridge, and the hop between the bridge and target)
    • If two bridges are used, three hops are defined (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, and the hop between the second bridge and target)
  • Defines the properties of initiating and target connectors
    When you create links and associated properties, you are creating Configuration Name Value Pairs (NVP). NVP does the following:
    • Specifies the behavior of the target connector component
    • Configures how data is transformed
    • Configures the underlying link data transportation layer
    • Affects how the initiator connector performs

    Optional properties allow for configuration refinement of the initiating or target connector. These link properties override connector properties.

  • Defines initiating and target networks for hops

    Links refer to networks to determine which interfaces to use for data transfer. A query originates in a certain server or network and is sent to a target server or network. If no network is specified, the link uses any working route.

    Networks are defined by rules that map physical network interfaces to logical network definitions, either by interface name or by CIDR notation. The rules are checked in the order in which they appear in a matching rules list.

  • Specifies a communications policy that defines rules for transferring data between target and initiating connectors and bridges (if used)

    Communication policies define how systems communicate with one another and allow you to configure the transfer concurrency and security options, and to enable or disable ZStandard data compression on row data blocks during data transfer.

    Teradata QueryGrid is preconfigured with a policy appropriate to use over LANs (LAN Policy) and a policy appropriate for use with WANs (WAN Policy).

    Communication policies are not necessarily specific to the systems involved, so you can reuse them.

  • Specifies user mappings (optional)

    User mappings allow a user logged on to the initiating system to submit queries as a specific user on the target system. You can map multiple users on the initiating system to a single user on the target system, if applicable, but cannot map multiple users on the target system to a single user on the initiating system.

    In the QueryGrid portlet, you can construct a table mapping a username on one data source to a username on another data source.

    When using the Hive, Presto, or Spark target connector without security, user mapping is typically required. For example, if a query is initiated using a Teradata-to-Hadoop link by user Joe, Teradata automatically changes the username to all uppercase and sends the query as user JOE by default to the target system. The following are possible outcomes for this use-case when no security is used with a Hive target connector:
    • The user Joe does not exist on the target system. In this scenario, the query must be run with an existing user on the target system such as hive. In this scenario, user mapping is required where JOE on the Teradata system is mapped to the user hive on the target Hive system.
    • The user Joe does exist on the target system, but as joe (all lowercase). In this scenario, user mapping is required where JOE on the Teradata system is mapped to joe on the target Hive system.
    • The user Joe does exist on the target as JOE (all uppercase). This scenario does not require user mapping.

    When a security platform such as Kerberos is used, user mapping is not required when using a Hive, Presto, or Spark target connector.