Features

Monitoring

  • Probes

    The platform provides ‌Probe‌ capabilities (black-box monitoring) based on the ‌Blackbox Exporter‌, enabling network service checks through protocols such as ICMP, TCP, and HTTP. Unlike white-box monitoring, which relies on internal system metrics, Probe evaluates services externally from the user’s perspective, rapidly identifying failures that impact user experience.

    For example, if a business interface fails to respond (e.g. HTTP 5xx errors) or a critical service becomes unavailable, Probe immediately detects the anomaly, generates alerts, and streamlines troubleshooting for operations teams.

  • Monitoring Dashboard

    The platform features a modernized monitoring dashboard management function, providing a more user-friendly visual configuration experience compared to traditional Grafana. By offering a unified monitoring view, it aggregates and displays various monitoring metric data, helping users quickly build the required monitoring dashboards.

Alert Notifications

  • Alert Strategies

    The platform provides comprehensive alerting capabilities, supporting the configuration of alert rules based on metrics, logs, and events. With a rich set of built-in monitoring metrics and alert templates, users can rapidly configure alert strategies that align with business needs, enabling timely detection and resolution of issues.

  • Alert Templates

    Alert templates standardize and encapsulate alert rules and notification strategies, supporting rapid reuse across multiple monitoring targets. Template-based configuration significantly reduces the management costs of alert strategies and enhances operational efficiency.

  • Alert History

    The system fully records the lifecycle of alerts, including trigger time, recovery time, alert status, alert level, and alert content. Users can trace and analyze issues through alert history, continuously optimizing alert configurations.

  • Notifications

    The platform supports multiple alert notification channels, including email, DingTalk, WeChat Work, Feishu, and Webhook, ensuring that alert information reaches the relevant personnel promptly. Users can flexibly configure notification strategies based on actual needs.

Distributed Tracing

The distributed tracing provides full-link tracing capabilities for microservice architectures. By collecting metadata of inter-service calls, it helps users quickly locate issues in cross-service calls.

Logs

The platform automatically collects and centrally manages standard output and file logs from clusters, nodes, and containers. It provides powerful log storage, retrieval, and analysis capabilities, supporting multi-dimensional log queries and visual displays, helping users quickly pinpoint issues.

Events

The platform collects critical event information in real-time from Kubernetes clusters, recording the complete process of resource state changes. When exceptions occur in clusters, nodes, Pods, etc., events can be traced to pinpoint root causes, significantly enhancing issue resolution efficiency.

Inspection

  • Inspection

    Drawing on extensive enterprise-level operational experience, the platform offers automated inspection capabilities. Through multi-dimensional health checks, it helps users monitor resource operational statuses in real-time, detect potential risks early, and reduce manual inspection costs.

  • Platform Health Status

    An intuitive overview of the platform's functional health status is provided, supporting the view of deployment conditions and component operational statuses. Users with platform management permissions can delve into detailed health check data, quickly locating and resolving platform-level issues.