During instance operation and maintenance, alerts can be used to notify of exceptions or statuses that require special attention. When an instance encounters an anomaly or reaches a warning state, alerts can be automatically triggered to help you promptly discover and pinpoint issues.
To enhance system operation and maintenance efficiency, the platform has established alert rules based on the monitoring indicators used in troubleshooting common instance faults, categorizing and consolidating them into built-in alert strategies.
The platform supports alert rules based on predefined monitoring indicators and user-defined monitoring indicators. When resource anomalies occur or a warning state is reached, alerts can be automatically triggered. Coupled with the platform's notification functionality, alert information can be actively pushed to operation and maintenance personnel, reminding them to timely address the alert resources, ensuring smooth business operations on the platform.
To facilitate the setting of alerts for a large number of resources on the platform, the platform supports customizing standardized alert configurations for similar instances through alert templates, allowing you to quickly create alert strategies for resources based on existing templates.
Metric Alert: Common monitoring indicators extracted by the platform that meet most customer needs. You can configure alerts by selecting monitoring indicators and setting trigger conditions. When the monitoring data meets the trigger conditions of the alert rule, an alert will be triggered.
Custom Alert: You can add enterprise-specific metric rules based on actual use cases to better meet the advanced alert needs of the enterprise on the basis of metric alerts.
Alert Status
Alert: The data obtained from querying the alert rules configuration meets the trigger conditions and has triggered an alert.
Pending: The monitoring data obtained from querying the alert rules configuration is greater than or equal to the alert threshold in the trigger conditions, but the duration has not yet met the trigger conditions, representing a critical state before the alert is triggered. For example: If the alert rule's trigger condition is CPU usage exceeds 80% and lasts for 3 minutes
, when the system first detects that the CPU exceeds 80%, it will mark the rule as Pending
. It will continue to evaluate, and if the CPU usage remains over 80% for 3 minutes, the rule's status will change to Alert; if at any later evaluation the CPU usage falls below 80%, the rule's status will revert to Normal.
Normal: The data obtained from querying the alert rules configuration does not meet the alert threshold.
Silent Status
Silent Pending: The status before the start of silence after it is set. In this status, when rules in the strategy trigger an alert, notifications will be sent normally.
Silent: The status from the start of silence to the end of silence. In this status, when rules in the strategy trigger an alert, notifications will not be sent.
The platform displays the number of resources currently under alert and detailed alert information through real-time alerts, allowing operation and maintenance personnel and administrators to understand the overall alert conditions of the business on the platform in real-time, identifying and addressing faults promptly to ensure normal platform operation.
To assist operation and maintenance personnel and administrators in analyzing recent monitoring alert conditions and tracing historical issues, the platform supports viewing historical alert records over a period of time. The historical information available for viewing includes occurrence time, alert rules, faulty resources, notification methods, status, and notification recipients, etc.
Note: The retention time of alert history is the same as the retention time of events and can be updated in the Administrator section by modifying the configuration parameters of the log storage component.