Monitoring and Alerts

The platform provides comprehensive monitoring capabilities with integrated dashboards for Redis instances. These monitoring features enable performance analysis, resource utilization tracking, and configurable alerting mechanisms for proactive management.

TOC

Monitoring

The platform automatically collects key performance metrics for Redis instances related to resource utilization and operational performance. These metrics can be viewed in real-time through the instance's Monitoring tab.

CategoryMetrics
Cluster Status MonitoringKey count statistics, command execution metrics, replication lag, etc.
Resource MonitoringMemory utilization, network traffic patterns, storage consumption, etc.
Performance MonitoringConnection count, network I/O throughput, command latency, etc.

Alerts

To configure alerting rules for Redis instances, navigate to the Alerts > Rules page in Alauda Application Service.

Configuring Alert Rules

Implementing alerts requires the creation of an alert rule in Alauda Application Service. An alert rule defines the monitoring targets, threshold conditions that trigger notifications, and the notification delivery mechanisms.

The platform provides the following pre-configured alert indicators:

IndicatorRecommended ThresholdDescription
Instance Status!=1, sustained for 30 secondsMonitors instance availability and operational state
Key Access Hit Rate< 80%, sustained for 30 secondsMonitors cache efficiency; low hit rates may indicate cache misses requiring strategy adjustments (increasing TTL values, optimizing key patterns, etc.)
Average Response Time>0.1s, sustained for 30 secondsMonitors command execution latency; prolonged response times may indicate CPU constraints, excessive workload, or BigKey operations
Master-Slave Failover=1, sustained for 30 secondsDetects master-slave role transitions that may indicate underlying infrastructure issues or Redis node failures
Inbound Bandwidth per NodeEnvironment-specific thresholdsMonitors network ingress in real-time to prevent bandwidth saturation affecting service availability
Outbound Bandwidth per NodeEnvironment-specific thresholdsMonitors network egress in real-time to prevent bandwidth saturation affecting service availability
Client Connections per NodeEnvironment-specific thresholdsMonitors connection patterns to detect potential connection leaks or abnormal access patterns
CPU Utilization per Node> 80%, sustained for 30 secondsMonitors CPU consumption; sustained high utilization may require capacity planning and scaling
Memory Utilization per Node> 80%, sustained for 30 secondsMonitors memory usage; approaching capacity limits requires immediate scaling to prevent eviction or OOM conditions
Storage Utilization per Node> 80%, sustained for 30 secondsMonitors persistent storage usage for RDB/AOF configurations; high utilization requires capacity expansion

These pre-configured indicators facilitate rapid alert rule implementation. For advanced monitoring requirements, custom alert indicators can be defined using Prometheus query syntax:

(1/(1+(avg(irate(redis_keyspace_misses_total{namespace=~"<namespace>", pod=~"<podname prefix>-.*"}[5m])) by(namespace,service) / (avg(irate(redis_keyspace_hits_total{namespace=~"<namespace>", pod=~"<podname prefix>-.*"}[5m])) by(namespace,service)+1))))

In this example, redis_keyspace_misses_total represents a Prometheus-collected metric, <namespace> filters resources by namespace, and <podname prefix> specifies the Pod name pattern for resources managed by Deployment or StatefulSet. For comprehensive information on metric queries, refer to the PromQL Official Documentation.

For detailed guidance on alert configuration and management, see .