In general, SystemFE macros are made available as a method for producing quick reports relevant to system messages stored in the DBC.SW_Event_Log. Informational, Warning, and Critical system messages are stored in the DBC.SW_Event_Log until they are manually deleted. Although not the only or definitive resource, this repository is good to consult for determining the relative health of the database platform.
Since system messages are stored in the DBC.SW_Event_Log until they are deleted manually, the System Administrator should develop a plan for deleting old messages from the Event_Log to keep it from growing to an unusually large size. Event_Log messages older than 90 days often are not useful. Proactive health checks should be performed every 30 days (not to exceed90 days). On the reactive problem discovery side, 90-day-old messages rarely are helpful in determining current problems.
When the Preventive Maintenance Report is generated, the data must be analyzed. The order in which the macros were run is important. The analysis can be divided into three steps, each step encompassing more detail than the last. The sections below describe the steps.
Step 1 - Identifying High Frequency
The ListErrorCodes macro gives an output of all event codes in the DBC.ErrorMsgs table. The EventCount macro lists the codes and their frequency of occurrence during the specified time period. This aids in identifying any application or transient hardware problems that do not cause restarts. For example:
3610: Internal Error, Please do not resubmit last request
means that a request was aborted and a snapshot dump taken without creating a restart. You should report a high frequency of occurrence of this event to the Teradata Support Center so that they can determine the appropriateness of requesting the site to migrate to a later software release.
If further information about these events (such as processor number, and so forth) is needed, use the ListSoftware_Event_Log or ListEvent macro to list all the occurrences of that particular event code in full detail.
For information about backtrace, examine the message log files.
The location of the event information is in log file /var/log/messages.
Step 2 - Database Restarts
The most important events from the event log to consider are database restarts. They are listed as ‘33-13855-00’ and ‘33-13892-00’ messages on the reports obtained from the EventCount macro and the AllRestarts macro. Restarts indicate that problems have caused database outages. Note the event codes and perform problem determination. For all events, verify that the Teradata Support Center has opened a Call Log to report their occurrence. If the events are hardware restarts, you can perform a more detailed analysis by following step 3.
For a complete description and resolution of these events, see “Parallel Database Extensions (PDE) Messages” Teradata Vantage™ - Database Messages, B035-1096.
Step 3 - Subsystem Detail
You should examine these reports to reveal any evidence of intermittently failing hardware that has not caused database restarts.