Configuring distributed tracing platform with Service Mesh

Alauda Service Mesh supports distributed tracing through integration with the following components:

  • Alauda Build of Jaeger v2: A customized distribution based on the open source Jaeger project. It provides end-to-end visibility into requests across complex distributed systems. In v2, the Jaeger instance is deployed as an OpenTelemetryCollector custom resource managed by the Alauda Build of OpenTelemetry v2 Operator.

  • Alauda Build of OpenTelemetry v2: Based on the OpenTelemetry project, this Operator manages the lifecycle of both the Jaeger v2 instance and the OpenTelemetry Collector that fronts it.

The OpenTelemetry Collector acts as an intermediary for telemetry signals. It supports multiple data formats and provides a standardized pipeline for processing and exporting telemetry to backends such as Jaeger.

Configuring distributed tracing data collection with Service Mesh

You can integrate Alauda Service Mesh with Alauda Distributed Tracing by deploying a Jaeger v2 instance and an OpenTelemetry Collector, then configuring Istio to export trace data through them.

The Jaeger v2 instance and the OpenTelemetry Collector are not service-mesh-specific: a single Jaeger v2 and OpenTelemetry Collector pair (deployed by default in the jaeger-system namespace) can serve both Service Mesh and other workloads in the cluster. This section only covers the Service Mesh-specific configuration; refer to the Alauda Distributed Tracing documentation for the underlying installation steps.

Prerequisites

  • The Alauda Build of OpenTelemetry v2 Operator is installed. See Installing the Alauda Build of OpenTelemetry v2 Operator.

  • A Jaeger v2 instance is deployed. See Deploying the Alauda Build of Jaeger v2.

    INFO

    The JAEGER_ES_INDEX_PREFIX variable referenced in the install procedure controls the prefix of the Elasticsearch indices that store trace data. The default value, acp-${CLUSTER_NAME}, works for a single-cluster deployment. For a service mesh deployment, choose the prefix according to the topology of the mesh:

    • For a single-cluster service mesh, we recommend ending the prefix with the cluster name, for example acp-cluster-1.
    • For a multi-cluster service mesh, traces from all clusters must be stored in the same index family; we recommend ending the prefix with the meshID, for example acp-mesh-1. Use the same JAEGER_ES_INDEX_PREFIX when running the install procedure on every cluster in the mesh so that the Jaeger UI can correlate spans across clusters.
  • An OpenTelemetry Collector is deployed. See Deploying the OpenTelemetry Collector.

  • An Istio instance is created.

  • An Istio CNI instance is created.

Procedure

Update the Istio resource to enable tracing and define the OpenTelemetry tracing provider

Example: Enabling tracing via meshConfig

apiVersion: sailoperator.io/v1
kind: Istio
metadata:
  name: default
  # ...
spec:
  namespace: istio-system
  # ...
  values:
    meshConfig:
      enableTracing: true
      extensionProviders:
      - name: otel
        opentelemetry:
          port: 4317
          service: otel-collector.jaeger-system.svc.cluster.local
  1. The service field is the FQDN of the OpenTelemetry Collector Service. The default value targets the Collector deployed in the jaeger-system namespace as described in Deploying the OpenTelemetry Collector. Replace it with the actual Collector address if you deployed the Collector in a different namespace or with a different instance name.

To apply this configuration, patch the Istio resource:

kubectl patch istio default --type=merge -p '
spec:
  values:
    meshConfig:
      enableTracing: true
      extensionProviders:
      - name: otel
        opentelemetry:
          port: 4317
          service: otel-collector.jaeger-system.svc.cluster.local
'
WARNING

This command uses a JSON merge patch, which replaces the entire meshConfig.extensionProviders array. If the Istio resource already defines other extension providers, they will be overwritten. To preserve them, edit the resource with kubectl edit istio default and append the otel entry by hand, or use a JSON Patch (--type=json) that appends to /spec/values/meshConfig/extensionProviders/-.

Update the Telemetry resource to enable the tracing provider defined in the meshConfig:

Example Istio Telemetry resource

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: asm-default
  namespace: istio-system
  # ...
spec:
  # ...
  tracing:
    - providers:
        - name: otel
      randomSamplingPercentage: 100

To apply this configuration, patch the Telemetry resource:

kubectl -n istio-system patch telemetry asm-default --type=merge -p '
spec:
  tracing:
    - providers:
        - name: otel
      randomSamplingPercentage: 100
'
NOTE

Once you verify that you can see traces, lower the randomSamplingPercentage value to reduce the number of requests.

Uninstalling distributed tracing

If you no longer need the distributed tracing integration with Service Mesh, remove the configuration in the order below.

Removing the Service Mesh tracing configuration

Before removing the underlying components, detach the mesh from the OpenTelemetry Collector so that Istio stops sending spans.

  1. Edit the Telemetry resource and remove the tracing providers entry that references the otel provider:

    kubectl -n istio-system edit telemetry asm-default

    Alternatively, remove the tracing configuration non-interactively with kubectl patch:

    kubectl -n istio-system patch telemetry asm-default --type=json -p='[{"op": "remove", "path": "/spec/tracing"}]'
  2. Edit the Istio resource and remove the meshConfig.extensionProviders entry named otel, or set meshConfig.enableTracing to false:

    kubectl edit istio default

    Alternatively, set meshConfig.enableTracing to false non-interactively with kubectl patch:

    kubectl patch istio default --type=merge -p='{"spec":{"values":{"meshConfig":{"enableTracing":false}}}}'

Uninstalling the OpenTelemetry Collector and Jaeger v2

Skip this step if other workloads in the cluster still rely on the OpenTelemetry Collector or the Jaeger v2 instance.

For instructions on deleting the OpenTelemetry Collector instance, the Jaeger v2 instance, and (optionally) the Alauda Build of OpenTelemetry v2 Operator, see Uninstalling Alauda Distributed Tracing.