13.00 - Server Management Operational and Problem Events - Server Management

Teradata® Server Management Product Guide

prodname
Server Management
vrm_release
13.00
created_date
December 2018
category
Configuration
User Guide
featnum
B035-6112-128K

The CMIC collects events for the components in the collective it manages, and consolidates and correlates them with other CMICs to provide a summarized view of the impacts to the systems and components in the Server Management domain.

Type Description
Event A change in condition that may require intervention. Events are collected from operating system logs, other device-specific interfaces, and the CMICs themselves.

Events that affect system operation generate alerts.

Alert

Alerts are derived from the collected events based on event signatures that are known to present significant problem conditions. The alert severity indicates the level of impact of the problem to the system or subsystem.

Alerts are generated for both software and hardware, including but not limited to the following:
  • Teradata Database
  • Kubernetes clusters
  • BYNET
  • Disk arrays
  • Node operating systems
  • Platform hardware
Summary Alert (known as System Problems in the Web Client) Summary alerts are generated based on groups of alerts that occur together during known system problems. A recommended action is provided for resolution of problem conditions. Based on their state, summary alerts can be closed (cleared or deleted) when problem conditions are known to be resolved.

Data bundles are used to diagnose problems. Software subsystems generate data bundles when they detect fault conditions. Data bundles reported during known fault conditions are escalated automatically to the TVI backend as soon as they are available, and they become available as attachments to an associated summary alert.