Teradata QueryGrid includes the following components.
|Teradata QueryGrid Manager||Software installed on a dedicated physical machine (TMS or server) or VM that enables definition, administration, and monitoring of Teradata QueryGrid.
After installing Teradata QueryGrid Manager, configure it in Viewpoint, and then use the QueryGrid portlet to install and configure the remaining Teradata QueryGrid components.
|Data Center||Logical name that represents the physical location of systems (data sources) in Teradata QueryGrid.|
|System (Data Source)||One or more data source nodes that share the same software platform, such as Teradata Database nodes, nodes in a Hadoop cluster (CDH or HDP), or nodes in a Presto cluster.|
|System (Bridge)||A subset of data source nodes or non-data source nodes used to perform CPU-intensive operations such as compression and encryption, and to transfer data.|
|Fabric||One or more data source nodes, in different systems, that run a compatible version of Teradata QueryGrid software over the same port.|
|Connector||Adapter software for a data source that enables data type mapping, conversion, and communication with other connectors in the same Teradata QueryGrid fabric.|
|Link||Named configuration that specifies which connectors can communicate with each other and defines rules of data transfer.|
Teradata QueryGrid Manager
- Administers and monitors Teradata QueryGrid
- Allows installation, configuration, and upgrade of Teradata QueryGrid
- Initiates connectivity and bandwidth diagnostics checks
- Captures and summarizes query performance metrics
- Manages keys for secure access to Teradata QueryGrid
- Captures logs generated from Teradata QueryGrid components
The number of Teradata QueryGrid Manager instances required depends on the QueryGrid Managers hardware specifications, the number of data source nodes in the fabric, and the volume of QueryGrid queries. At least two QueryGrid Managers are recommended for high availability.
When more than one Teradata QueryGrid Manager instance is installed, cluster the Teradata QueryGrid Manager instances for high availability and scalability. Clustering makes sure that all Teradata QueryGrid Manager instances have the same configuration so they can each administer and monitor Teradata QueryGrid. Teradata recommends creating only one QueryGrid Manager cluster in your enterprise system to prevent silos where systems are not able to join the same QueryGrid fabric.
When a cluster is present, failover and recovery of a QueryGrid Manager instance in a cluster is automatic. If one instance goes offline, the other instances in the cluster take over the workload of the failed instance. Once the failed QueryGrid Manager comes back online, it automatically recovers any new configuration updates made while offline and resumes its workload as it was before the failure.
A Data Center performs the following functions in Teradata QueryGrid:
- Allows Teradata QueryGrid Manager to determine whether communication between two data sources (systems) is across a LAN or WAN
- Makes sure communication with Teradata QueryGrid Manager remains LAN-local if a Teradata QueryGrid Manager is available locally
When you install Teradata QueryGrid Manager software on a physical machine, a default Data Center is created using the hostname of the TMS, VM, or server. The default Data Center name is displayed in the QueryGrid portlet.
- Create a new Data Center in the QueryGrid portlet
- Create a new Data Center during clustering if you have two or more Teradata QueryGrid Manager instances.
System (Data Source)
During Teradata QueryGrid configuration, Teradata QueryGrid is added as a monitored system.
- Teradata Database
- Hive (CDH and HDP)
- Spark SQL
Data source nodes added to systems in Teradata QueryGrid can be associated with only one system. During Teradata QueryGrid configuration, the node and fabric software is installed on every node in a system. The node software manages the fabrics and connectors on the node.
- All network traffic to flow through the nodes in the bridge system instead of through all the nodes in data source systems
- Offloading CPU-intensive operations such as data encryption or compression to the nodes in the bridge system instead of using all the nodes in data source systems
- Using nodes connected to a public network for data transfer
- A subset of data source nodes running QueryGrid node software. The data source nodes can be in the initiator system, or target system, or both.
- A set of non-data source nodes running QueryGrid node software. The non-data source nodes do not need connector software because they do not have local data to process. An example of a non-data source node is an edge node in a Hadoop system.
A link is used to define the data transfer path. One or two bridges can be included in the data transfer path. A link can contain one or more hops. The number of hops is based on the number of bridges.
- If there are no bridges, there is one hop.
- If there is one bridge, there are two hops.
- If there are two bridges, there are three hops.
The hop defines the network and communication policy to be used for data movement between connection points in the data transfer path.
A fabric performs the following functions in Teradata QueryGrid:
- Enables communication between paired data source nodes of the same type, such as Teradata Database and Teradata Database, or a different type, such as Teradata Database and Presto
- Allows a user to initiate a single SQL query that joins data from two or more systems (data sources) within the fabric
There is no restriction on the size of the data that is transported between data source nodes in a fabric.
- Allows the Teradata QueryGrid connectors to:
- Communicate with each other in parallel
- Run, process, and transport data efficiently
- Monitors fabric usage per query and reports metrics to the Teradata QueryGrid Manager
Connectors and links are associated with a fabric.
- Provides for query processing across data sources (systems)
- Translates query request SQL from one source query format to another
- Transforms data, converting the data from one data type or format to another so that the data can be exchanged between different systems (data sources).
- Enables data sources to participate in queries. Any connector that joins the fabric can participate in the queries.
- Enables data sources to initiate queries. All connectors can initiate queries.
- Enables sending and receiving data to and from the fabric
- Communicates with other connectors in the fabric
- Controls the running of queries on target systems. All connectors can act as a target of queries and control the queries that are run on a target system (data source) on behalf of the initiator system (data source)
- Return query results to initiating systems
Connectors are specific to the system type (Teradata Database, Hive, or Presto-configured Hadoop), and there can be only one connector for each system type. A Teradata Database system hosts a Teradata connector, but a single Hadoop system can host multiple connectors (Hive and Presto).
Optional connector properties allow you to refine a connector type configuration, or override connector properties set during configuration.
Connector software connects data sources with the Teradata QueryGrid fabric. The software is installed on all data source nodes in a system. Fabric software is also installed on all data source nodes in a system. Fabric software includes a driver. A driver runs on one or more data source nodes called driver nodes. As part of query processing, a driver node receives requests from the initiator connector and submits the requests to the target system. The driver loads the connector, reads messages, invokes methods to process requests, and sends responses.
When you configure a connector, you must specify which nodes in a system (data source) are driver nodes. Only a subset of the nodes on a large system need to be driver nodes. Use multiple driver nodes for redundancy and to share the workload required for query initiation.
- Reuses physical connections
- Reduces overhead for QueryGrid queries
- Minimizes creating and closing sessions
The connection pool uses the same JDBC connection for all phases of a single query or for subsequent queries with the same session and user credentials. For more information on configuring tunable connector pool properties, see the connector and link properties information in the relevant connector topics.
- Initiating connector: Point from which a QueryGrid query originates. For example, in a Teradata-to-Presto query, the Teradata connector initiates the query.
- Target connector: Destination point of a QueryGrid query. For example, in a Teradata-to-Presto query, a Presto connector is the target connector that the initiating connector accesses to either import or export data.
- If you have defined only Teradata Database connectors, you can create links only between Teradata Database systems.
- If you have defined both Teradata and Presto connectors, you can create links between Teradata Database systems and Presto systems.
- If you have defined Teradata, Presto and Hive connectors, you can create links for all combinations of these connector types.
Links simplify configuration of foreign server definitions in preparation for running QueryGrid queries.
- Specifies an initiating connector and a target connector link pairing to use for querying
- Specifies whether bridges are used:
- If no bridges are used, only one hop is defined (the hop between the initiator and target)
- If one bridge is used, two hops are defined (the hop between the initiator and bridge, and the hop between the bridge and target)
- If two bridges are used, three hops are defined (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, and the hop between the second bridge and target)
- Defines the properties of initiating and target connectorsWhen you create links and associated properties, you are creating Configuration Name Value Pairs (NVP). NVP does the following:
- Specifies the behavior of the target connector component
- Configures how data is transformed
- Configures the underlying link data transportation layer
- Affects how the initiator connector performs
Optional properties allow for configuration refinement of the initiating or target connector. These link properties override connector properties.
- Defines initiating and target networks for hops
Links refer to networks to determine which interfaces to use for data transfer. A query originates in a certain server or network and is sent to a target server or network. If no network is specified, the link uses any working route.
Networks are defined by rules that map physical network interfaces to logical network definitions, either by interface name or by CIDR notation. The rules are checked in the order in which they appear in a matching rules list.
- Specifies a communications policy that defines rules for transferring data between target and initiating connectors and bridges (if used)
Communication policies define how systems communicate with one another and allow you to configure the transfer concurrency and security options, and to enable or disable ZStandard data compression on row data blocks during data transfer.
Teradata QueryGrid is preconfigured with a policy appropriate to use over LANs (LAN Policy) and a policy appropriate for use with WANs (WAN Policy).
Communication policies are not necessarily specific to the systems involved, so you can reuse them.
- Specifies user mappings (optional)
User mappings allow a user logged on to the initiating system to submit queries as a specific user on the target system. You can map multiple users on the initiating system to a single user on the target system, if applicable, but cannot map multiple users on the target system to a single user on the initiating system.
In the QueryGrid portlet, you can construct a table mapping a username on one data source to a username on another data source.When using the Hive, Presto, or Spark target connector without security, user mapping is typically required. For example, if a query is initiated using a Teradata-to-Hadoop link by user Joe, Teradata automatically changes the username to all uppercase and sends the query as user JOE by default to the target system. The following are possible outcomes for this use-case when no security is used with a Hive target connector:
- The user Joe does not exist on the target system. In this scenario, the query must be run with an existing user on the target system such as hive. In this scenario, user mapping is required where JOE on the Teradata system is mapped to the user hive on the target Hive system.
- The user Joe does exist on the target system, but as joe (all lowercase). In this scenario, user mapping is required where JOE on the Teradata system is mapped to joe on the target Hive system.
- The user Joe does exist on the target as JOE (all uppercase). This scenario does not require user mapping.
When a security platform such as Kerberos is used, user mapping is not required when using a Hive, Presto, or Spark target connector.