Suggested Alerts and Thresholds
The following table lists key events and the values that constitute either warning or alerts.
Type |
Event |
Warning |
Critical |
CPU saturation |
Average system CPU > x% |
(x = 95) |
|
I/O saturation |
CPU+WIO > x% and WIO > y% |
(x = 90, y = 20) |
|
Query blocked |
Query or Session blocked on a resource for longer than x minutes, and by whom |
(x = 60) |
|
Entire system blocked |
Total number of blocked processes > x |
|
(x = 10) |
User exceeding normal usage |
Number of sessions per user >x (with an exclusion list, and custom code to roll up sessions by user) |
(x = 4) |
|
“Hot Node” or “Hot AMP” problem |
Inter-node or inter-AMP parallelism is less than x% for more than 10 minutes |
(x=10) |
|
Disk space |
Disk Use% > x (vproc) |
(x=90) |
(x=95) |
Product Join |
Average BYNET > x% (system) |
(x=50) |
|
System restart |
Restart |
|
SNMP |
Node down |
Node is down |
|
SNMP |
Heartbeat query |
Timeout |
|
SNMP |