Distributed Tracing and Service Mesh

TOC

Building applications to support trace context propagation

Although Istio proxies can automatically send spans, extra information is needed to join those spans into a single trace. Applications must propagate this information in HTTP headers, so that when proxies send spans, the backend can join them together into a single trace.

To do this, each application must collect headers from each incoming request and forward the headers to all outgoing requests triggered by that incoming request. The choice of headers to forward depends on the configured trace backend. The following is a summary:

All applications should forward the following headers:

  • x-request-id: an Envoy-specific header that is used to consistently sample logs and traces.
  • traceparent and tracestate: W3C standard headers

For other observability tools, refer to their documentation.

Configuring distributed tracing platform with Service Mesh

Alauda Service Mesh supports distributed tracing through integration with the following components:

  • Alauda Build of Jaeger: A customized distribution based on the open source Jaeger project. It provides end-to-end visibility into requests across complex distributed systems.

  • Alauda Build of OpenTelemetry: Based on the OpenTelemetry project, this component simplifies telemetry data collection across metrics, logs, and traces by managing the OpenTelemetry Collector and workload instrumentation.

The OpenTelemetry Collector acts as an intermediary for telemetry signals. It supports multiple data formats and provides a standardized pipeline for processing and exporting telemetry to backends such as Jaeger.

Configuring distributed tracing data collection with Service Mesh

You can integrate Alauda Service Mesh with OpenTelemetry to instrument, generate, collect, and export OpenTelemetry traces, metrics, and logs to analyze and understand your software's performance and behavior.

Prerequisites

  • .
  • .
  • An Istio instance is created.
  • An Istio CNI instance is created.

Procedure

Install a Jaeger instance in the istio-system namespace.

Refer to the installation script in and use the example command below to deploy an Istio-dedicated Jaeger:

INFO

The --jaeger-es-index-prefix parameter sets the index prefix in Elasticsearch where trace data is stored.

  • For a single-cluster service mesh, we recommend ending the prefix with the cluster name, for example istio-tracing-cluster-1.
  • For a multi-cluster service mesh, traces from all clusters must be stored in the same index; we recommend ending the prefix with the meshID, for example istio-tracing-mesh-1.
./install-jaeger.sh \
  --es-url='https://xxx' \
  --es-user-base64='xxx' \
  --es-pass-base64='xxx' \
  --target-namespace='istio-system' \
  --jaeger-basepath-suffix='/istio/jaeger' \
  --jaeger-es-index-prefix='istio-tracing-xxx'

After the installation completes successfully, you can access the Jaeger UI to query traces at <platform-url>/clusters/<cluster>/istio/jaeger.

Example OpenTelemetry Collector in istio-system namespace

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: istio-system
spec:
  observability:
    metrics: {}
  deploymentUpdateStrategy: {}
  config:
    processors:
      batch: {}
    exporters:
      debug: {}
      otlp:
        endpoint: 'dns:///jaeger-prod-collector-headless.istio-system:4317'
        balancer_name: round_robin
        tls:
          insecure: true
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: '0.0.0.0:4317'
    service:
      pipelines:
        traces:
          exporters:
            - debug
            - otlp
          processors:
            - batch
          receivers:
            - otlp
  1. The endpoint field is the Jaeger collector service in the istio-system namespace.

Update the Istio resource to enable tracing and define the OpenTelemetry tracing provider:

Example: Enabling tracing via meshConfig

apiVersion: sailoperator.io/v1
kind: Istio
metadata:
  name: default
  # ...
spec:
  namespace: istio-system
  # ...
  values:
    meshConfig:
      enableTracing: true
      extensionProviders:
      - name: otel
        opentelemetry:
          port: 4317
          service: otel-collector.istio-system.svc.cluster.local
  1. The service field is the OpenTelemetry collector service in the istio-system namespace.

Update the Telemetry resource to enable the tracing provider defined in the meshConfig:

Example Istio Telemetry resource

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: asm-default
  namespace: istio-system
  # ...
spec:
  # ...
  tracing:
    - providers:
        - name: otel
      randomSamplingPercentage: 100
NOTE

Once you verify that you can see traces, lower the randomSamplingPercentage value to reduce the number of requests.