English

Introduction

Distributed Tracing is a key module in the observability system of container platforms, used for achieving end-to-end tracing and analysis of distributed systems. This module is built based on the OpenTelemetry (OTel) standard, providing a complete solution from data collection, storage to visual analysis, helping developers and operations personnel to quickly locate service call anomalies, analyze performance bottlenecks, and trace the entire lifecycle behavior of requests.

By integrating with open-source technology stacks and self-developed components, this module supports end-to-end tracing capabilities: applications generate tracing data through OTel automatic injection or SDK integration methods, which are then uniformly collected and stored in Elasticsearch, ultimately realized through a customized UI for multi-dimensional visual analysis. Users can conduct precise searches using rich conditions such as TraceID, service name, tags, and more.

Advantages

The core advantages of tracing are as follows:

End-to-End Tracing Capability
Supports complete tracing restoration across services, processes, and container boundaries, accurately presenting complex call relationships in microservice architectures.
Flexible Data Collection Methods
Provides dual modes of automatic injection (no code modification) and SDK integration, compatible with mainstream language applications such as Java/Python/Go.
High-Performance Storage Solutions
Utilizes Elasticsearch as the storage backend, supporting the writing and fast retrieval of massive span data.
Flexible Querying and Analysis Capabilities
The self-developed UI integrates with the jaeger-query API, supporting flexible queries based on multi-dimensional conditions such as TraceID, service affiliation, tags, and span types, facilitating users in quickly pinpointing root causes of issues.
Standardized Protocol Support
Built on the OpenTelemetry standard, it can integrate tracing data generated by other OTel cloud-native components.

Application Scenarios

The main application scenarios of tracing are as follows:

Distributed System Fault Diagnosis
In microservice architectures, complete tracing enable quick identification of service faults and anomalous calls, reducing fault diagnosis time.
Performance Bottleneck Analysis
By examining the latency between service calls, performance bottlenecks can be identified, guiding system optimization and resource adjustments.
Service Dependency Analysis
A time-series waterfall diagram clearly shows the call paths and dependencies between services, assisting architects in system design and improvement.

Usage Limitations

When using tracing, the following constraints should be noted:

Balancing Sampling Strategies and Performance
- In high-load scenarios, the collection of tracing data may exert certain pressure on Elasticsearch's performance and storage; it is recommended to configure the sampling rate reasonably based on business conditions.

View full docs as PDF

How to

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application

GitOps Observability

Architecture

Guides

How To

Guides

How To

Troubleshooting

Architecture

Guides