In the process of transitioning enterprise applications to containerization and microservices, a large and complex distributed microservices architecture is formed. There is a need for a system to manage and operate these services, helping developers address challenges such as service discovery, load balancing, fault recovery, metric collection and monitoring, canary release, rate limiting, access control, end-to-end authentication, etc.
Istio provides a comprehensive microservices solution, meeting the diverse needs of microservices applications by providing behavioral insight and operational control for the entire service mesh. With a highly practical implementation of Istio, enterprise users only need to connect their services to the platform, which can provide comprehensive governance capabilities for enterprise applications and services without intrusion. The features cover service global visualization, service release management, reliable service connection governance, service mesh lifecycle management, service error troubleshooting, and service security governance scenarios.
In complex microservices application scenarios, facing a large number of services and intricate service-to-service invocation relationships, developers need to monitor the overall operation of services in real-time, understanding the communication between components. When business failures or performance issues occur, being able to quickly pinpoint the specific location of the problem improves operational efficiency and reduces business operation risks.
The platform integrates Istio with OpenTelemetry Java Agent and Jaeger, providing comprehensive observability capabilities.
Collect monitoring data in a non-intrusive manner, allowing developers to focus on business development without worrying about how to obtain monitoring data.
Display service invocation topology and traffic monitoring data in visual charts, supporting call chain data queries.
Quickly configure and manage service traffic governance policies, routing rules, and security policies through forms.
In the rapid iteration and optimization process of microservices applications, if a new version is directly released to all users, encountering online incidents (such as faults, crashes caused by large-scale traffic changes, etc.) without quick problem resolution will significantly impact user experience.
When releasing new versions of applications in a microservices system, it's necessary to minimize the impact on business while helping developers verify the functionality, performance, and user satisfaction of the new version.
The platform implements canary release functionality based on Istio's routing capabilities and the open-source Flagger component. During the release of a new version of a service, some users can continue to use the old version while others use the new version. If the new version is stable and users have no objections to it, the proportion of traffic to the new version can be gradually increased until all users are migrated to the new version.
During the canary release process, problems can be detected promptly, defects can be fixed, and traffic behavior can be monitored in real-time. Automatic rollback can be implemented for abnormal situations during the release to minimize the impact.
In large-scale microservices systems, the inter-service invocation relationships are complex. Once a service fails and the problem persists for a long time, it may cascade to affect other services, leading to system crashes.
Therefore, for developers and operators of microservices systems, real-time observation of service health status to prevent potential risks and quickly troubleshooting failures with detailed monitoring data are crucial when encountering business failures.
The platform constructs a complete service issue troubleshooting path. When a user's service encounters a failure, the location of the failure can be pinpointed by tracing abnormal traffic, narrowing down the scope of troubleshooting, and then quickly resolving the issue by analyzing specific reasons combined with logs and monitoring data.
Provides linked topology diagrams, traffic monitoring panels, call chains, and logs to provide a complete path for error troubleshooting.
Allows viewing traffic data at the service, Pod, and API levels through traffic monitoring panels.
In a microservices architecture, an application is composed of multiple services, each performing a single business function. Data interaction between services is completed through remote calls. This brings a problem: when service A calls service B, and service B calls other services, if the response time of other services is too long or they are unavailable, the calls to service A will occupy more and more resources, leading to system crashes, known as the avalanche effect.
To avoid the avalanche effect in microservices systems, the platform supports configuring traffic governance policies and routing rules for services, enabling the formulation of protection mechanisms and fault simulations for inter-service calls to ensure the reliability of service connections and enhance the overall stability of microservices systems.
Supports load balancing strategies, circuit breaking, and connection pool settings.
Supports error injection, delayed responses, timeouts and retries, request rewriting, and traffic mirroring based on routing.
After splitting monolithic applications into services, development efficiency, system stability, and operational efficiency are improved. Calls between services change from local calls to network protocol interface calls, bringing security risks. Therefore, traffic encryption and configuration of authorization for service-to-service calls, along with providing validation and auditing functions, are necessary in a microservices governance platform.
The platform provides traffic encryption for inter-service traffic. Traffic can be encrypted using mTLS by setting security policies for services.
By setting blacklists or whitelists for services, access control between services within the same mesh can be implemented.
A complete Service Mesh not only includes Istio components but also integrated extension components. In a multi-cluster environment, managing service meshes becomes particularly complex. Throughout the lifecycle of a service mesh, not only deployment, updating, and deletion of service meshes are involved, but more importantly, monitoring the health status of service meshes is essential for operations personnel to timely discover anomalies, troubleshoot faults, and ensure continuous and stable support for business.
Provide comprehensive lifecycle management functions for service meshes, including deployment, updating, deletion, and health monitoring.
Service mesh management: deployment of core and extension components of the service mesh, updating, and deletion.
Component health checks: health checks for core and extension components.
Functionality health checks: health checks for main functionalities, including built-in functionality-related check items.
Monitoring: Integrated with Prometheus and Grafana, providing default dashboards, monitoring components, and business information.