QueryGrid includes the following components:
Component | Description |
---|---|
QueryGridâ„¢ Manager | Software installed on a dedicated physical machine (TMS or server) or VM that enables definition, administration, and monitoring of QueryGrid. After installing QueryGrid Manager, configure it in Viewpoint, and then use the QueryGridâ„¢ portlet to install and configure the remaining QueryGrid components. |
Data Center | Logical name that represents the physical location or region of data sources (systems) and QueryGrid Manager instances. When available, data source nodes communicate with QueryGrid Manager that share the same Data Center. |
Data Source | System containing one or more data source nodes that share the same software platform, such as Teradata system nodes, nodes in a Hadoop cluster, or nodes in a Presto cluster. |
Bridge | A system containing a subset of data source nodes or non-data source nodes used to provide connectivity between data sources systems that do not have direct network connectivity. |
Fabric | One or more data source nodes, spanning multiple systems, that run a compatible version of QueryGrid tdqg-fabric software listening on the same port. Only links that involve the Teradata Connector are supported. For example, links where neither the initiator nor target connector are a Teradata connector, such as Hive-to-Oracle, are not supported |
Connector | Adapter software for a data source that enables data type mapping, conversion, and communication with other connectors in the same QueryGrid fabric. The initiating connector is the connector you interact with to start the query and the target connector is the connector that then gets triggered on the remote side to do most of the processing of the query. The following connectors are supported:
|
Link | Named configuration that specifies which connectors can communicate with each other and defines rules of data transfer. |
QueryGrid Manager
- Administer and monitor QueryGrid
- Install, configure, and upgrade of QueryGrid
- Initiate link, connector, or bandwidth diagnostic checks
- Summarize query performance metrics
- Manage keys for secure access to QueryGrid
- Capture logs generated from QueryGrid components
- QueryGrid Manager hardware specifications
- Number of data source nodes in the fabric
- Volume of QueryGrid queries
- High availability (a minimum of two QueryGrid Manager instances are required, see QueryGrid Manager Sizing)
- Monitor QueryGrid queries
- Receive problem condition alerts
- Make configuration changes
- Create capture support bundles for failed queries
- Run diagnostic checks
When all QueryGrid Managers are offline, QueryGrid queries can continue to run provided that none of the QueryGrid fabric services in the query path are restarted.
QueryGrid Manager Cluster
When more than one QueryGrid Manager instances are installed, cluster the QueryGrid Manager instances for high availability and scalability. Clustering makes sure that all QueryGrid Manager instances have the same configuration and each instance can administer and monitor QueryGrid. Create multiple QueryGrid Manager clusters to maintain separate production and test environments.
When a cluster is present, failover and recovery of a QueryGrid Manager instance is automatic. If one instance goes offline, the other instances in the cluster take over the workload of the failed instance. When the failed QueryGrid Manager is back online, any new configuration updates made while offline are automatically recovered and the workload resumes as it was before the failure.
Data Center
A Data Center is a location or region where QueryGrid connected systems reside. By associating QueryGrid Managers and QueryGrid connected systems with data centers, QueryGrid can make sure heartbeats, configuration updates, query metrics, and logs transferred between the nodes and QueryGrid Manager remain local to the data center or region.
When you install QueryGrid Manager software on a physical machine, a default Data Center is created using the hostname of the TMS, VM, or server. The default Data Center name is displayed in the QueryGridâ„¢ portlet.
- Create a new Data Center in the QueryGrid portlet
- Create a new Data Center during clustering if you have two or more QueryGrid Manager instances.
Data Source
The following data source systems can be added to QueryGrid. Each data source system must have its own connector.
Data Source | Connector |
---|---|
Analytics Database | Teradata |
Hadoop | Hive, Spark SQL, Presto |
Standalone Presto | Presto |
Oracle Driver Nodes | Oracle |
BigQuery Driver Nodes | BigQuery |
Generic JDBC Driver Node | Generic JDBC |
During the QueryGrid configuration, the node and fabric software is installed on every node in a system – with the exception of Oracle, BigQuery, and a generic data source where the software is only installed on the Oracle, BigQuery, or generic driver nodes. The node software manages the fabrics and connectors on the node.
Bridge
- A subset of data source nodes running QueryGrid node software. The data source nodes can be an initiator system, target system, or both.
- A set of non-data source nodes running QueryGrid node software. The non-data source nodes do not need connector software because there is no local data to process. An example of a non-data source node is an edge node in a Hadoop system.
For each bridge system added to a link, a new hop configuration is required that specifies the initiating side network, target side network, and the communication policy to use for the data transferred for that hop. A hop is the transfer of data between two directly connected data source or bridge systems. The number of hops is based on the number of bridges.
Number of Bridges | Number of Hops Defined |
---|---|
No bridge | One hop (the hop between the initiator and target) |
One bridge | Two hops (the hop between the initiator and bridge, and the hop between the bridge and target) |
Two bridges | Three hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, and the hop between the second bridge and target) |
Three bridges | Four hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, the hop between the second bridge and third bridge, and the hop between the third bridge and target). |
Four bridges | Five hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, the hop between the second bridge and third bridge, the hop between the third bridge and fourth bridge, and the hop between the fourth bridge and target). |
Fabric
- Enable communication between paired data source nodes of the same type, such as Teradata and Teradata, or a different type, such as Teradata and Presto
- Allow a user to initiate a single SQL query that joins data from two or more data sources within the fabric
There is no restriction on the size of the data that is transported between data source nodes in a fabric.
- Allow QueryGrid connectors to:
- Communicate with each other in parallel
- Run, process, and transport data efficiently
- Monitor fabric usage per query and reports metrics to the QueryGrid Manager
Connectors and links are associated with a fabric.
Connectors
- Provide query processing across data sources (systems).
- Translate query request SQL from one source query format to another.
- Transform data, converting the data from one data type or format to another so that the data can be exchanged between different data source systems.
- Enable data sources to participate in queries. Any connector that joins the fabric can participate in the queries.
- Enable data sources to initiate queries.
- Except for the Oracle, BigQuery, and Generic JDBC connectors.
- Enable sending and receiving data to and from the fabric.
- Communicate with other connectors in the fabric.
- Control the running of queries on target systems. All connectors can act as a target of queries and control the queries that run on a target data source system on behalf of the initiator data source system.
- Return query results to initiating systems.
Connectors are specific to the system type (for example, Teradata systems, Hive, or Presto-configured Hadoop). For instance, Teradata system hosts a Teradata connector, but a single Hadoop system can host multiple connectors (for example, Hive, Spark, and Presto).
Optional connector properties allow you to refine a connector type configuration or override connector properties set during configuration.
Connector software connects data sources with the QueryGrid fabric. The software is installed on all data source nodes in a system. Fabric software is also installed on all data source nodes in a system. Fabric software includes a driver. A driver runs on one or more data source nodes called driver nodes. As part of query processing, a driver node receives requests from the initiator connector and submits the requests to the target system. The driver loads the connector, reads messages, invokes methods to process requests, and sends responses.
- Specify which nodes in a data source are driver nodes.
Only a subset of the nodes on a large system need to be driver nodes.
- Select multiple driver nodes for redundancy and to share the workload required for query initiation.
- Limit the number of driver nodes to enable better session reuse.
- Reuses physical connections
- Reduces overhead for QueryGrid queries
- Minimizes creating and closing sessions
The connection pool uses the same JDBC connection for all phases of a single query or for subsequent queries with the same session and user credentials. For information on configuring tunable connector pool properties, see the connector and link properties information for the relevant connector.
Links
Connector Type | Description |
---|---|
Initiating | Point from which a QueryGrid query originates. For example, in a Teradata-to-Presto query, the Teradata connector initiates the query. |
Target | Destination point of a QueryGrid query. For example, in a Teradata-to-Presto query, a Presto connector is the target connector that the initiating connector accesses to either import or export data. |
Connectors Defined | Result |
---|---|
Teradata systems only | You can create links only between Teradata systems. |
Teradata, Presto, Hive, Spark, Oracle, BigQuery, and Generic JDBC | You can create links between Teradata systems with any of these connector types. |
Links simplify configuration of foreign server definitions in preparation for running QueryGrid queries.
- An initiating connector and a target connector link pairing to use for queries
- The number of bridges used, if any:
Number of Bridges Number of Hops Defined No bridge One hop (the hop between the initiator and target) One bridge Two hops (the hop between the initiator and bridge, and the hop between the bridge and target) Two bridges Three hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, and the hop between the second bridge and target) Three bridges Four hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, the hop between the second bridge and third bridge, and the hop between the third bridge and target). Four bridges Five hops (the hop between the initiator and the first bridge, the hop between the first bridge and second bridge, the hop between the second bridge and third bridge, the hop between the third bridge and fourth bridge, and the hop between the fourth bridge and target). - The properties of the initiating and target connectorsWhen you create links and associated properties, you are creating Configuration Name Value Pairs (NVP). An NVP does the following:
- Specifies the behavior of the target connector component
- Configures how data is transformed
- Configures the underlying link data transportation layer
- Affects how the initiator connector performs
Optional properties allow for configuration refinement of the initiating or target connector. These link properties override connector properties.
- Define initiating and target networks for hops
- Links refer to networks to determine which interfaces to use for data transfer. A query originates in a certain server or network and is sent to a target server or network. If no network is specified, the fabric tries all known target node network addresses in parallel and uses the first address that successfully establishes a connection.
- Networks are defined by rules that map physical network interfaces to logical network definitions, either by interface name or by Classless Inter-Domain Routing (CIDR) notation. The rules are checked in the order they appear in a matching rules list.
- A communications policy that defines rules for transferring data between target and initiating connectors and bridges (if used)
Communication policies define how systems communicate with one another. They allow you to configure the transfer concurrency and enable or disable ZStandard data compression on row data blocks during data transfer.
QueryGrid is preconfigured with a Communication Policy with compression enabled (Compression) and one without compression enabled (No Compression).
Communication policies are not necessarily specific to the systems involved, so you can reuse them.
- User mappings (optional)
User mappings allow a user logged on to the initiating system to submit queries as a specific user on the target system. You can map multiple users on the initiating system to a single user on the target system, if applicable, but you cannot map multiple users on the target system to a single user on the initiating system.
In the QueryGrid portlet, you can map usernames on one data source to usernames on another data source.
When using the Hive, Presto, or Spark target connector without security, user mapping is typically required. For example, if a query is initiated using a Teradata-to-Hive link by user Joe, Teradata automatically changes the username to all uppercase and sends the query as user "JOE" by default to the target system. The following are possible outcomes for this use-case when no security is used with a Hive target connector:Scenario Requirement User "JOE" does not exist on the target system Queries must be run with an existing user on the target system such as "hive". In this scenario, user mapping is required where "JOE" on the Teradata system is mapped to the user "hive" on the target Hive system. Does exist on the target system as joe User mapping is required where "JOE" on the Teradata system is mapped to "joe" on the target Hive system. Does exist on the target as "JOE" This scenario does not require user mapping. When a security platform such as Kerberos is used, user mapping is not required when using a Hive, Presto, or Spark target connector.
- Enable or disable acknowledgments. Acknowledgments allow QueryGrid to reestablish connections that break due to transient network errors. If queries fail due to network errors, this option can be enabled at the expense of slightly more message overhead.