Maintenance Concepts/Guidelines | Teradata Unity - Basic Maintenance Concepts - Continuous Availability - Teradata Unity

Teradata® Unity™ Monitoring and Management Guide

Product
Continuous Availability
Teradata Unity
Release Number
17.00
Published
September 2020
Language
English (United States)
Last Update
2020-09-15
dita:mapPath
yuc1569253595897.ditamap
dita:ditavalPath
ft:empty
dita:id
yuc1569253595897
Product Category
Analytical Ecosystem
Unity routes queries to one or more managed systems that are independent Teradata database systems using one of the following methods:
Routing Method Description
Passive routing Unity sends each query to a single system as defined by routing rules.
Managed routing Unity sends read queries to a single system and sends write queries to all TD systems managing that object.

With managed routing, if an error occurs or a system falls behind, Unity replays missed transactions to maintain synchronized data. To maintain a Unity cluster and make sure all components are working, you must monitor all managed systems and the Unity system.

To ensure overall system health, review and apply the following maintenance concepts:
Concept or Condition Description Maintenance Action
Unity Management User Unity uses a special database user called the Unity Management User on all managed systems.

The Unity Management User is essential to all Unity operation.

Make sure the Unity Management User is not obstructed by database throttles.
Alert monitoring setup to push critical events Unity needs emergency notifications. Several Unity alerts indicate a critical condition that needs to be resolved.
Raised alerts Unity raises alerts for system conditions that need to be addressed.
  • Check alerts regularly.
  • Fix and close valid alerts, then close any alert that is not closed automatically.
  • Disable alerts that are raised as a result of normal system operation. For example, when the default routing rule is expected to be used, disable the "Default Routing Rule used" alert, if it is acceptable to use the default routing rule.
Interrupted objects and systems When an error or other condition on a system prevents a query from being run, objects and systems may become interrupted. Some interrupts resolve on their own automatically, others require intervention. If there are interrupted objects, other objects may become interrupted Repair interrupted objects and systems as soon as possible. Fix the underlying error to make the object active again. See Manual Monitoring.
Unrecoverable objects and systems When a data mismatch occurs, Unity makes objects unrecoverable. That may cause additional objects to become unrecoverable. Use logs to determine the cause of the data mismatch, then fix the underlying issue. Resynchronize unrecoverable objects and systems as soon as possible to restore dual-active service.
Data resynchronization plan You must synchronize data after a data mismatch or after objects and workloads are added. Use a utility such as Data Mover or another duplication strategy to synchronize individual objects or entire systems.
Non-Unity connections For Unity to replicate data, run SQL through Unity instead of connecting directly to the managed system.

Connecting directly to managed systems may cause mismatched data, unexpected deadlocks, and object or system interrupts due to resource contention.

If there are unexpected data mismatches or deadlocks, check DBQL to make sure objects are not being accessed from outside Unity.

Resync unrecoverable objects and systems as soon as possible to restore dual-active service.

Sessions managed from outside Unity Unity manages multiple sessions to multiple managed systems. If a session is lost, Unity attempts to reconnect the session. If you need to kill a session associated with Unity, kill it at either unityadmin or the Unity UI so that Unity knows that session is not to be reconnected.
Mismatched system capabilities When one managed system has significantly greater processing power than the other managed systems, objects may become interrupted if the system with less processing power is unable to handle the load of the queries. Make sure all managed systems can handle workload for both space and processing capacity. If users run queries beyond the capability of a system, consider passive routing for the user or set the job objects to be managed only on the capable system.
TASM restrictions If TASM rules are mismatched, objects may become interrupted when queries are sent to all systems and aborted on only one system. If load slots are mismatched, loads to all systems are constrained by the system with fewer load slots. Make sure all managed systems have matching TASM rules for all users and jobs that are managed on all systems.