Maintenance Concepts/Guidelines | Teradata Business Continuity Manager - Basic Maintenance Concepts - Teradata Business Continuity Manager

Teradata® Business Continuity Manager Monitoring and Management Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Business Continuity Manager
Release Number
1.xx, 2.xx
Published
January 2024
Language
English (United States)
Last Update
2024-02-06
dita:mapPath
wmz1639626084071.ditamap
dita:ditavalPath
ft:empty
dita:id
wmz1639626084071
Product Category
Analytical Ecosystem
Business Continuity Manager routes queries to one or more managed systems that are independent database systems using one of the following methods:
Routing Method Description
Passive routing Business Continuity Manager sends each query to a single system as defined by routing rules.
Managed routing Business Continuity Manager sends read queries to a single system as configured in passive routing rules. Business Continuity Manager sends the write queries to the active system first, and if successful, the request is then sent to the standby system for replication.

With managed routing, if an error occurs or a system falls behind, Business Continuity Manager replays missed transactions until successful or there is a manual intervention to skip that transaction from recovery. To maintain a Business Continuity Manager cluster and make sure all components are working, you must monitor all managed systems and the Business Continuity Manager system.

To make sure of the overall system health, review and apply the following maintenance concepts:
Concept or Condition Description Maintenance Action
Business Continuity Manager User Business Continuity Manager uses a special database user called the Business Continuity Manager User (tdbcmgmt user) on all managed systems.

The Business Continuity Manager User is essential to all Business Continuity Manager operations.

Make sure database throttles do not obstruct the Business Continuity Manager User.
Alert monitoring setup to push critical events Business Continuity Manager needs emergency notifications. Several Business Continuity Manager alerts indicate a critical condition that needs to be resolved.
Raised alerts Business Continuity Manager raises alerts for system conditions that need to be addressed.
  • Check alerts regularly.
  • Fix and close valid alerts, then close any alert that is not closed automatically.
  • Disable alerts that are raised as a result of normal system operation. For example, when the default routing rule is expected to be used, disable the "Default Routing Rule used" alert, if it is acceptable to use the default routing rule.
Interrupted objects and systems When an error or other condition on a system prevents a query from running, objects and systems may become interrupted. Some interrupts resolve on their own automatically, although others require intervention. If there are interrupted objects, other objects may become interrupted.

See Interrupted Objects.

Repair interrupted objects and systems as soon as possible. Fix the underlying error to make the object active again. See Manual Monitoring.
Unrecoverable objects and systems When a data mismatch occurs, Business Continuity Manager makes objects unrecoverable. That may cause additional objects to become unrecoverable. Use logs to determine the cause of the data mismatch, then fix the underlying issue. Resynchronize unrecoverable objects and systems as soon as possible to restore active-standby service.
Data resynchronization plan You must synchronize data after a data mismatch or after objects and workloads are added. Use a utility such as QueryGrid, Data Mover, or another duplication strategy to synchronize individual objects or entire systems.

The self-healing feature can automatically validate, resynchronize, and reactivate unrecoverable tables.

Non-Business Continuity Manager connections For Business Continuity Manager to replicate data, run SQL through Business Continuity Manager instead of connecting directly to the managed system.

Connecting directly to managed systems may cause mismatched data, unexpected deadlocks, and object or system interrupts due to resource contention.

If there are unexpected data mismatches or deadlocks, check DBQL to make sure objects are not being accessed from outside Business Continuity Manager.

Resynchronize unrecoverable objects and systems as soon as possible to restore active-standby service.

Sessions managed from outside Business Continuity Manager Business Continuity Manager manages multiple sessions to multiple managed systems. If a session is lost, Business Continuity Manager attempts to reconnect the session. Use either bcmadmin or Business Continuity Manager UI to stop a session associated with Business Continuity Manager. This way, Business Continuity Manager knows that session is not to be reconnected.
Mismatched system capabilities When the active system has significantly greater processing power than the standby system, objects may become interrupted if the standby system cannot handle the load of the queries. Make sure all managed systems can handle workload for both space and processing capacity. If users run queries beyond the capability of a system, consider passive routing for the user or set the job objects to be managed only on the capable system.
TASM restrictions If TASM rules are mismatched, objects may become interrupted when queries are sent to and stopped on the standby system. If load slots are mismatched, loads to all systems are constrained by the system with fewer load slots. Make sure all managed systems have matching TASM rules for all users and jobs managed on all systems.