Scenario

Build a distributed microservices system

Scene description

In the process of transitioning enterprise applications to containerization and microservices, a large and complex distributed microservices architecture is formed. It is necessary to have a system that can manage and operate these services to help developers address the challenges they face, such as service discovery, load balancing, fault recovery, metrics collection and monitoring, gray release, rate limiting, access control, end-to-end authentication, etc.

Solution

Istio provides a complete microservice solution to meet the diverse needs of microservice applications by providing behavioral insights and operational control for the entire service grid. The platform highly practices Istio, enterprise users only need to access the service platform, you can use the platform non-intrusive for enterprise applications and services to provide a full range of governance capabilities, with comprehensive functionality covering service global visualization, service release management, service connection reliability governance, service grid lifecycle management, service error troubleshooting, service security governance scenarios.

Visualization of global service

Scene description

In complex microservices application scenarios, developers need to have real-time visibility into the overall operation of services, including the service invocation relationships, trace routes, monitoring information, etc., in order to understand the communication between components. This helps to quickly identify the specific location of problems when there are failures or performance issues, improving operational efficiency and reducing business risks.

Solution

The platform integrates OpenTelemetry Java Agent and Jaeger on top of Istio, providing comprehensive observability capabilities.

  • Collect monitoring data in a non-intrusive manner, allowing developers to focus on business development without worrying about how to obtain monitoring data.

  • Visualize service call topology and traffic monitoring data in the form of charts, and support trace data queries.

  • Traffic governance strategies, routing rules, and security policies for services can be quickly configured and managed through forms.

Service release management

Scene description

In the process of rapid iteration and optimization of microservice applications, if the new version is directly released to all users, once an online accident occurs (such as failure or crash caused by large-scale traffic changes), it will be difficult to quickly resolve the problem, which will seriously affect the user experience.

When deploying a new version of an application in a microservices system, it is important to minimize the impact on the business while also helping developers validate the new version's functionality, performance, and user satisfaction.

Solution

The gray release feature is implemented based on the Istio routing capability and the Flagger open-source component. When a new version of the service is released, it allows a portion of users to continue using the old version while another portion uses the new version. During the gray release process, if the new version is stable and users have no objections to it, the traffic scheduling ratio can be gradually adjusted to expand the scope of users using the new version until all users are migrated to the new version.

During the gray release process, problems can be promptly identified, defects can be fixed, and traffic behavior can be monitored in real-time during the deployment process. Automatic rollback can be implemented for abnormal situations to ensure minimizing the impact.

Traffic error troubleshooting

Scene description

In large-scale microservice systems, the calling relationships between services are complex and intertwined. Once a service fails and the problem cannot be resolved for a long time, it may cascade and impact other services, leading to system crashes.

Therefore, for developers and operators of microservice systems, it is crucial to monitor the health status of services in real-time to prevent potential risks. It is also important to be able to quickly troubleshoot business failures with detailed monitoring data.

Solution

The platform has established a comprehensive service troubleshooting process. In the event of a service failure, the platform can trace abnormal traffic to identify the location of the failure, narrow down the troubleshooting scope, and then analyze specific causes using logs and monitoring data to quickly resolve the issue.

  • Provide a linkage topology diagram, traffic monitoring panel, trace, and logs to provide a complete path for troubleshooting.

  • The Traffic Monitoring panel allows you to view traffic data at the Service, Pod, and API levels.

Service connection reliability governance

Scene description

In a microservices architecture, an application consists of multiple services, each responsible for a single business function. Data interaction between services is achieved through remote calls. However, this introduces a problem: when service A calls service B, and service B calls other services, if the response time of the other services is too long or they become unavailable, the calls to service A will consume more and more resources, eventually leading to system failure, known as the avalanche effect.

Solution

To prevent cascading failures in microservice systems, the platform provides support for configuring traffic governance strategies and routing rules for services. These mechanisms can establish protection mechanisms and simulate failures for service invocations, ensuring the reliability of service connections and enhancing the overall stability of the microservice system.

  • Support load balancing strategies, circuit breakers, and connection pool settings strategies.

  • Support error injection, delayed response, timeout and retry, request rewriting, and traffic duplication based on routing.

Service connection security governance

Scene description

After the monolithic application is split into services, it improves development efficiency, enhances system stability, and increases operational efficiency. The invocation between services changes from local invocation to network protocol interface invocation, which also brings security risks. Therefore, in the microservice governance platform, it is necessary to encrypt the traffic, configure the authorization for service invocation, and provide verification and auditing functions.

Solution

  • The platform provides the ability to encrypt traffic between services. By setting security policies for services, traffic can be encrypted using mTLS.

  • By setting blacklists or whitelists for services, you can control access rights from service to service under the same grid.

Full lifecycle management of service mesh

Scene description

A complete Service Mesh not only includes Istio components, but also integrated extension components. In a multi-cluster environment, managing the service mesh becomes particularly complex. Throughout the lifecycle of the service mesh, it involves not only the deployment, update, and deletion of the service mesh, but also the monitoring and viewing of the health status of the service mesh. This enables operations personnel to promptly detect anomalies, troubleshoot issues, and ensure that the service mesh provides consistent and stable support for the business.

Solution

Provides full lifecycle management capabilities of the service grid, from deployment, update, deletion, to health monitoring.

  • :: Service Grid management: including deployment of Service Grid core components and extension components, Service Grid update, deletion, etc.

  • Component health check: health check of core components and extended components.

  • Functionality Health Check: Health check of main functionality, built-in functionality related check items.

  • Monitoring: Integrates with Prometheus and Grafana to provide default Kanban boards to monitor component and business information.