A canary query a common SQL statement that is run at specific intervals and monitored for data such as response time to characterize the current system workload. Each canary query is fixed to do a consistent amount of work per execution.
You can use Teradata Viewpoint to send canary queries to:
- Measure response time as an indicator of system demand or system or database hangs.
- Measure response time for various TASM workloads and performance tiers.
- Initiate an alert system if response time degrades so that you can take appropriate action.
- Establish response time Service Level Agreements (SLAs) based on canary response times.
System Canary Queries
Use system canary queries to check for overall system or database hangs, to take some kind of action when response times reach certain thresholds, or when stalled, such as send alert and/or capture system level information.
More than just a check, a system canary query should execute diagnostics that capture the state of the system if performance stalls.
System canary queries are intended specifically to focus on the core system. They should be short-running (one second), low impact queries on tables that are normally not write locked.
System canary queries are most useful when run frequently. For example, some sites run them every 3 to 5 minutes; other sites find every 5 to 10 minutes adequate.
They should be run on a system node. This will eliminate other factors, such as middle tiers, network connections.
Depending on their makeup, canary queries can add to contention for resources. Use them selectively, where needed, with shorter queries preferable.
Sample System Canary Query
The simplest canary monitor query is the following:
SELECT * from DBC.DBCInfoV;
As the query runs, Teradata Viewpoint can monitor the query, logging start and end times. If the query runs longer than the indicated threshold, an alert and perhaps diagnostic scripts are automatically executed.
Production Canary Queries
Production canary queries may be used to:
- Take response time samplings, storing them for tracking purposes, or
- Monitor the expected response times of specific groups of queries, such as short-running tactical queries running in high priority.
Response times are an indicator of system demand. When system demand is high, canary response is high.
From a user perspective, a sudden deviation in response times would have an immediate impact, since users of consistently short running queries would be the first to notice performance degradation.
Production canary queries have wider uses than system canary queries and can be used in a variety of ways. For example, they:
- Can be run on production user tables.
- Could be run from other endpoints in the system architecture, such as a network client PC or z/OS client to expand scope of monitoring.
- Monitor overall response.
- Monitor specific area of the job mix.
- Are run less frequently than system canary queries, usually once every 20 to 60 minutes.
In using a production query from a non-TPA node location, other things, such as network and middle-tier monitoring, are also covered, but when it stalls, you need to investigate further to determine where the bottleneck is located.
Once the response time for a canary query is stored in a table, it can be summarized for use in tracking trends.