Local storage provides out-of-the-box monitoring metrics collection and alerting capabilities. Once the platform monitoring component is enabled, monitoring and alerts can be configured based on storage clusters, storage performance, and storage capacity, with support for configuring notification policies.
The intuitively presented monitoring data can be utilized to support decision making for operational inspections or performance tuning, and a comprehensive alerting mechanism will help ensure the stable operation of the storage system.
By default, the platform collects commonly used performance monitoring metrics such as read and write bandwidth, IOPS, and latency for local storage. Real-time monitoring data for these metrics can be viewed on the Monitoring tab of the Local Storage page under Storage Management. The platform displays these metrics visually through graphs and charts, allowing administrators to clearly observe current storage performance and quickly identify potential issues.
Since local storage can only use locally available storage resources on nodes, users must ensure there is sufficient available capacity on the nodes before declaring local storage to avoid issues caused by over-declaring.
To assist with this, the platform provides detailed capacity monitoring in the Details section of local storage, categorized by device types. Users can check available storage space clearly displayed in numerical and graphical formats. If any device type shows insufficient available capacity, space should be cleared or additional disk devices added before using local storage.
The platform includes a set of default alerting policies. If resources become abnormal or monitoring data reaches a warning threshold, alerts are automatically triggered. The preconfigured alerting policies effectively cover common operational needs, including alerts for cluster health status and device type capacity.
To ensure alerts are received in a timely manner, notification policies should be configured in the operations center. Notifications can be sent through email, SMS, or other methods to relevant personnel, prompting immediate attention to resolve issues or prevent failures. Users can access the notification policy settings directly from the operations center interface. Detailed instructions on configuring alerts can be found in the [Creating Alert Policies] documentation.
If the health status of the storage cluster changes to Alert
, administrators should investigate immediately. The Details section provides information for troubleshooting and resolving these issues. Common causes include abnormal node services or problems with specific device types.
Inspection Item | Corresponding Status | Cause |
---|---|---|
Health Status | Alert | Caused by abnormal node services or device type issues. |
Service Status | Unknown | Node is in a notready state, possibly due to network failures or power outages. |
Device Type Status | Unavailable | The disk in use may not be a raw disk, or it might be missing. |
Real-time alerts triggered on the Alert tab require prompt attention, even if the storage cluster status currently appears Healthy
. Quick responses prevent escalation into more serious issues. The following table outlines alert levels and their implications:
Alert Level | Meaning |
---|---|
Critical | Indicates significant issues causing platform service interruptions or data loss, with severe impacts. |
Major | Known issues potentially affecting platform functionality and normal business operations. |
Warning | Risk of operational issues exists; timely intervention needed to avoid impact on normal business operations. |
The Alert History logs all alerts triggered previously that no longer require immediate action. During post-mortem analysis, consider the following: