View node monitoring data on the node details page.
When a cluster has more than 1 node, you can click the current node name in the resource path area on the node details page to expand the node dropdown list, then click to select a node for quick switching to other node details pages.
When monitoring components are configured for the cluster, you can view node monitoring data including resource runtime status, resource usage, and resource trend statistics.
In the left navigation bar, click Clusters > Clusters.
Click the cluster name where the target node is located.
Under the Nodes tab, click the target node name.
Click the Monitoring tab to enter the node monitoring data display page and view relevant node monitoring data.
Hover over a card and click the Details icon to view PromQL expressions; click the Export icon to export PromQL expressions for all charts on the current page.
When a cluster has more than 1 node, you can click the current node name in the resource path area on the node details page to expand the node dropdown list, then click to select a node for quick switching to other node details pages.
In the storage space statistics display area, when a node has more than 4 storage partitions:
In the partition total usage pie chart, the top 3 partitions with the highest usage are displayed separately, while remaining partitions are shown as Others with their total usage data displayed when hovering over the area;
In the partition usage bar chart, the top 3 partitions with the highest usage are displayed separately, while remaining partitions are shown as Others with their total usage and individual usage rates displayed when hovering over the bars.
The monitoring trend statistics are described in the following table.
| Parameter | Description |
|---|---|
| CPU | Usage rate, request rate, and limit rate of CPU within the specified time range. Usage rate = CPU usage of all pods on the node / Total CPU of the node. Note: If the CPU usage rate of a node spikes during a certain period, you must first identify the process consuming the most CPU resources. For example, for Java custom applications, memory leaks or infinite loops in the code may cause high CPU usage. Request rate = CPU requests of all pods on the node / Total CPU of the node. Note: If the CPU request rate of a node spikes during a certain period, it may be due to unreasonable cluster oversubscription ratio settings or excessively high request values for pods running on the node, which may cause resource waste. Limit rate = CPU limits of all pods on the node / Total CPU of the node. Note: If the CPU limit rate of a node spikes during a certain period, it indicates that the limit values for pods running on the node are set too high, which may cause CPU resource waste. |
| Memory | Usage rate, request rate, and limit rate of memory within the specified time range. Usage rate = Memory usage of all pods on the node / Total memory of the node. Memory is one of the important components on a server and serves as a bridge for CPU communication. Therefore, memory performance has a significant impact on the machine. When programs run, data loading, thread concurrency, and I/O buffering all depend on memory. The available memory size determines whether programs can run normally and how they run. Request rate = Memory requests of all pods on the node / Total memory of the node. Note: If the memory request rate of a node spikes during a certain period, it may be due to unreasonable cluster oversubscription ratio settings or excessively high request values for pods running on the node, which may cause resource waste. Limit rate = Memory limits of all pods on the node / Total memory of the node. Note: If the memory limit rate of a node spikes during a certain period, it indicates that the limit values for pods running on the node are set too high, which may cause memory resource waste. |
| Storage | Space usage rate and inode usage rate within the specified time range. Space usage rate = Storage space used / Total storage space. By monitoring historical disk space data, you can evaluate disk usage during a given time period. When disk usage is high, you can free up disk space by cleaning up unnecessary images or containers. Inode usage rate = Inode storage used / Total inode storage. Note: Every file must have an inode to store file metadata such as file creator and creation date. Inodes also consume disk space, and many small cache files can easily lead to inode resource exhaustion. Additionally, when inodes are exhausted but the disk is not full, new files cannot be created on the disk. |
| System Load | Average CPU load over 1 minute, 5 minutes, and 15 minutes. The value is the ratio of the total number of processes currently being executed by the CPU and waiting to be executed by the CPU to the maximum number of processes the CPU can execute, which is an important indicator of system busy/idle status. Note: If the 1-minute/5-minute/15-minute curves are similar over a certain period, it indicates that the cluster's CPU load is relatively stable. If the 1-minute value is much greater than the 15-minute value at a certain time period or specific time point, it indicates that the load in the recent 1 minute is increasing and needs continued observation. Once the 1-minute value exceeds the number of CPUs, it may indicate system overload. You need to further analyze the root cause of the problem. If the 1-minute value is much smaller than the 15-minute value at a certain time period or specific time point, it indicates that the system load is decreasing in the recent 1 minute and generated high load in the previous 15 minutes. |
| Disk Throughput | Disk throughput within the specified time range refers to the speed of data flow transmission by the disk, where transmission data is the sum of read and write data. |
| Disk IOPS | Disk IOPS within the specified time range is the sum of continuous reads and writes per second, representing a performance metric of the number of read and write operations per second by the disk. |
| Network Traffic Rate | Network traffic inflow and outflow rates within the specified time range, counted by the node's physical network interface. |
| Network Packet Rate (packets/sec) | Network packet receive and send rates within the specified time range, counted by the node's physical network interface. |