Routing Method | Description |
---|---|
Passive routing | Business Continuity Manager sends each query to a single system as defined by routing rules. |
Managed routing | Business Continuity Manager sends read queries to a single system as configured in passive routing rules. Business Continuity Manager sends the write queries to the active system first, and if successful, the request is then sent to the standby system for replication. |
With managed routing, if an error occurs or a system falls behind, Business Continuity Manager replays missed transactions until successful or there is a manual intervention to skip that transaction from recovery. To maintain a Business Continuity Manager cluster and make sure all components are working, you must monitor all managed systems and the Business Continuity Manager system.
Concept or Condition | Description | Maintenance Action |
---|---|---|
Business Continuity Manager User | Business Continuity Manager uses a special database user called the Business Continuity Manager User (tdbcmgmt user) on all managed systems. The Business Continuity Manager User is essential to all Business Continuity Manager operations. |
Make sure database throttles do not obstruct the Business Continuity Manager User. |
Alert monitoring setup to push critical events | Business Continuity Manager needs emergency notifications. Several Business Continuity Manager alerts indicate a critical condition that needs to be resolved. |
|
Raised alerts | Business Continuity Manager raises alerts for system conditions that need to be addressed. |
|
Interrupted objects and systems | When an error or other condition on a system prevents a query from running, objects and systems may become interrupted. Some interrupts resolve on their own automatically, although others require intervention. If there are interrupted objects, other objects may become interrupted. See Interrupted Objects. |
Repair interrupted objects and systems as soon as possible. Fix the underlying error to make the object active again. See Manual Monitoring. |
Unrecoverable objects and systems | When a data mismatch occurs, Business Continuity Manager makes objects unrecoverable. That may cause additional objects to become unrecoverable. | Use logs to determine the cause of the data mismatch, then fix the underlying issue. Resynchronize unrecoverable objects and systems as soon as possible to restore active-standby service. |
Data resynchronization plan | You must synchronize data after a data mismatch or after objects and workloads are added. | Use a utility such as QueryGrid, Data Mover, or another duplication strategy to synchronize individual objects or entire systems. The self-healing feature can automatically validate, resynchronize, and reactivate unrecoverable tables. |
Non-Business Continuity Manager connections | For Business Continuity Manager to replicate data, run SQL through Business Continuity Manager instead of connecting directly to the managed system. Connecting directly to managed systems may cause mismatched data, unexpected deadlocks, and object or system interrupts due to resource contention. |
If there are unexpected data mismatches or deadlocks, check DBQL to make sure objects are not being accessed from outside Business Continuity Manager. Resynchronize unrecoverable objects and systems as soon as possible to restore active-standby service. |
Sessions managed from outside Business Continuity Manager | Business Continuity Manager manages multiple sessions to multiple managed systems. If a session is lost, Business Continuity Manager attempts to reconnect the session. | Use either bcmadmin or Business Continuity Manager UI to stop a session associated with Business Continuity Manager. This way, Business Continuity Manager knows that session is not to be reconnected. |
Mismatched system capabilities | When the active system has significantly greater processing power than the standby system, objects may become interrupted if the standby system cannot handle the load of the queries. | Make sure all managed systems can handle workload for both space and processing capacity. If users run queries beyond the capability of a system, consider passive routing for the user or set the job objects to be managed only on the capable system. |
TASM restrictions | If TASM rules are mismatched, objects may become interrupted when queries are sent to and stopped on the standby system. If load slots are mismatched, loads to all systems are constrained by the system with fewer load slots. | Make sure all managed systems have matching TASM rules for all users and jobs managed on all systems. |