Monitoring and Alerts

Monitor and alert on virtual machines in terms of CPU, memory, storage, and network. To facilitate timely alerts, notification policies can also be configured.

The intuitively presented monitoring data can be used to provide decision-making support for operations inspection or performance tuning, while the comprehensive alerting and notification mechanism will help ensure the stable operation of virtual machines.

Monitoring

By default, the platform collects commonly used performance monitoring metrics for virtual machines, including CPU, memory, storage, and network. Navigate to Virtualization > Virtual Machines, and on the Monitoring tab in the virtual machine details, you can view real-time monitoring data for the metrics.

Alerts

Configuring Alert Policies

To enable alerts, you must first create an alert policy. An alert policy describes the objects you wish to monitor, the conditions under which you wish to be alerted, and how you will be notified of relevant alerts. Navigate to Container Platform > Virtualization > Virtual Machines, and in the virtual machine details, click Create Alert Policy on the Alerts tab to complete the configuration.

ParameterDescription
Alert Type- Metric Alert: The monitored object is a platform predefined metric, such as Memory Usage Rate.
- Event Alert: The monitored object is the cause of an event, that is, the reason the virtual machine transitioned to its current state, e.g., BackOff, Pulling, Failed.
Trigger ConditionComposed of comparison operators, alert thresholds, and duration. By comparing the real-time monitoring results with the set thresholds, it determines whether to alert.
If a duration is set, the platform will also compare the duration for which the monitored object has been in the alert state.
Alert Level- Hint: The monitored object has expected issues that do not immediately affect business operations but pose potential risks. For example, if CPU usage exceeds 70% for 3 minutes.
- Warning: The monitored object has operational risks that may affect normal business operations if not addressed promptly. For example, if CPU usage exceeds 80% for 3 minutes.
- Serious: The monitored object has known issues that may lead to platform functionality failures, affecting normal business operations.
- Disaster: The monitored object has failed, resulting in platform service interruptions, data loss, with significant impact.

Tip: The virtual machine alerting function is similar to the platform's general alerting function. For more detailed configuration guidance, please refer to the general Alerts documentation.

Handling Alerts

Navigate to the Alerts tab, and if there are alert status strategies indicated, please address them promptly.

Binding Notification Policies

In addition to real-time alerts on the Alerts tab, the platform also supports sending alert information via email, SMS, and other means to relevant personnel, notifying them to take necessary measures to resolve issues or prevent failures. The notification policy needs to be set up by contacting the administrator.