The platform provides comprehensive monitoring capabilities with integrated dashboards for Redis instances. These monitoring features enable performance analysis, resource utilization tracking, and configurable alerting mechanisms for proactive management.
The platform automatically collects key performance metrics for Redis instances related to resource utilization and operational performance. These metrics can be viewed in real-time through the instance's Monitoring tab.
Category | Metrics |
---|---|
Cluster Status Monitoring | Key count statistics, command execution metrics, replication lag, etc. |
Resource Monitoring | Memory utilization, network traffic patterns, storage consumption, etc. |
Performance Monitoring | Connection count, network I/O throughput, command latency, etc. |
To configure alerting rules for Redis instances, navigate to the Alerts > Rules page in Alauda Application Service.
Implementing alerts requires the creation of an alert rule in Alauda Application Service. An alert rule defines the monitoring targets, threshold conditions that trigger notifications, and the notification delivery mechanisms.
The platform provides the following pre-configured alert indicators:
Indicator | Recommended Threshold | Description |
---|---|---|
Instance Status | !=1, sustained for 30 seconds | Monitors instance availability and operational state |
Key Access Hit Rate | < 80%, sustained for 30 seconds | Monitors cache efficiency; low hit rates may indicate cache misses requiring strategy adjustments (increasing TTL values, optimizing key patterns, etc.) |
Average Response Time | >0.1s, sustained for 30 seconds | Monitors command execution latency; prolonged response times may indicate CPU constraints, excessive workload, or BigKey operations |
Master-Slave Failover | =1, sustained for 30 seconds | Detects master-slave role transitions that may indicate underlying infrastructure issues or Redis node failures |
Inbound Bandwidth per Node | Environment-specific thresholds | Monitors network ingress in real-time to prevent bandwidth saturation affecting service availability |
Outbound Bandwidth per Node | Environment-specific thresholds | Monitors network egress in real-time to prevent bandwidth saturation affecting service availability |
Client Connections per Node | Environment-specific thresholds | Monitors connection patterns to detect potential connection leaks or abnormal access patterns |
CPU Utilization per Node | > 80%, sustained for 30 seconds | Monitors CPU consumption; sustained high utilization may require capacity planning and scaling |
Memory Utilization per Node | > 80%, sustained for 30 seconds | Monitors memory usage; approaching capacity limits requires immediate scaling to prevent eviction or OOM conditions |
Storage Utilization per Node | > 80%, sustained for 30 seconds | Monitors persistent storage usage for RDB/AOF configurations; high utilization requires capacity expansion |
These pre-configured indicators facilitate rapid alert rule implementation. For advanced monitoring requirements, custom alert indicators can be defined using Prometheus query syntax:
In this example, redis_keyspace_misses_total
represents a Prometheus-collected metric, <namespace>
filters resources by namespace, and <podname prefix>
specifies the Pod name pattern for resources managed by Deployment
or StatefulSet
. For comprehensive information on metric queries, refer to the PromQL Official Documentation.
For detailed guidance on alert configuration and management, see .