The following table lists key events and the values that constitute either warning or alerts.
Type | Event | Warning | Critical |
---|---|---|---|
CPU saturation | Average system CPU > x% | (x = 95) | |
I/O saturation | CPU+WIO > x% and WIO > y% | (x = 90, y = 20) | |
Query blocked | Query or Session blocked on a resource for longer than x minutes, and by whom | (x = 60) | |
Entire system blocked | Total number of blocked processes > x | (x = 10) | |
User exceeding normal usage | Number of sessions per user >x (with an exclusion list, and custom code to roll up sessions by user) | (x = 4) | |
“Hot Node” or “Hot AMP” problem | Inter-node or inter-AMP parallelism is less than x% for more than 10 minutes | (x=10) | |
Disk space | Disk Use > x% (vproc) | (x=90) | (x=95) |
Product Join | Average BYNET > x% (system) | (x=50) | |
System restart | Restart | SNMP | |
Node down | Node is down | SNMP | |
Heartbeat query | Timeout | SNMP |