Unable to Query the Required Tracing

Problem Description

When querying the tracing in a service mesh, you may encounter situations where the target tracing cannot be retrieved.

Root Cause Analysis

1. Tracing Sampling Rate Configured Too Low

When the sampling rate parameter for the tracing is set too low, the system will only collect tracing data proportionally. During times of insufficient request volume or low-peak periods, this may lead to the sampled data being below the visibility threshold.

2. Elasticsearch Real-Time Limitations

The default configuration for Elasticsearch index is "refresh_interval": "10s", which results in a delay of 10 seconds before data is refreshed from the memory buffer to a searchable state. When querying recently generated tracing, the results may be missing because the data has not yet been persisted.

This index configuration can effectively reduce the data merge pressure on Elasticsearch, improving indexing speed and the speed of the first query, but it also reduces the real-time nature of the data to some extent.

Solution for Root Cause 1

  • Appropriately increase the sampling rate according to requirements.
  • Use richer sampling methods, such as tail sampling.

Solution for Root Cause 2

Adjust the refresh interval through the --es.asm.index-refresh-interval startup parameter of jaeger-collector, with a default value of 10s.

If the value of this parameter is "null", there will be no configuration for the index's refresh_interval.

Note: Setting it to "null" will affect the performance and query speed of Elasticsearch.