Migrating from Alauda Build of OpenTelemetry to Alauda Build of OpenTelemetry v2
This document describes how to migrate an existing Alauda Build of OpenTelemetry (built on upstream OpenTelemetry Operator/Collector 0.108.0) deployment to Alauda Build of OpenTelemetry v2 (built on upstream 0.147.0).
The two distributions are delivered through different OLM packages — opentelemetry-operator and opentelemetry-operator2 — but they own the same Custom Resource Definitions (OpenTelemetryCollector and Instrumentation). OLM does not allow two Operators to own the same CRDs simultaneously, therefore the migration must be performed as uninstall v1 → install v2, not as a side-by-side upgrade.
TOC
OverviewWhat changes between v1 and v2Migration outage windowMigration flow at a glancePrerequisitesPre-migration tasksInventory the existing deploymentBack up v1 resourcesPreparing the Java agent imageCheck Collector configuration compatibilityMigration procedureRollbackTroubleshootingOverview
What changes between v1 and v2
- Because the v1 Operator (
opentelemetry-operator) and the v2 Operator (opentelemetry-operator2) share CRD ownership, you cannot install v2 until v1 is fully uninstalled. The CRDs themselves are preserved across the migration; the v2 Operator adopts and upgrades them on installation.
Migration outage window
Telemetry collection is interrupted between the time the v1 OpenTelemetryCollector is deleted and the time the v2 OpenTelemetryCollector becomes ready. Application pods continue to run normally, but telemetry generated during the gap may be temporarily buffered and can be dropped if it cannot be exported in time. Plan the migration during a low-traffic window and notify telemetry consumers in advance.
Migration flow at a glance
Prerequisites
- An active ACP CLI (
kubectl) session by a cluster administrator with thecluster-adminrole. - The
jqcommand-line JSON processor is installed. - Alauda Build of OpenTelemetry (v1) is currently installed in the cluster.
- If any service uses OTel Java auto-instrumentation, a Java auto-instrumentation image is available in a registry accessible from the cluster. See Preparing the Java agent image. Services that do not use OTel Java auto-injection do not need this image.
- Telemetry consumers (for example, Jaeger, Prometheus, the platform Tracing console) and application owners are notified about the planned outage window.
Pre-migration tasks
Inventory the existing deployment
Before making any changes, capture the current state so that you understand the migration scope and can produce backups for rollback.
-
List the v1 Operator resources:
-
List the existing OpenTelemetry custom resources:
-
List the workloads that currently rely on Java auto-instrumentation:
Back up v1 resources
Export the v1 resources so that you can rebuild them in v2 (and roll back if needed).
The backup files are only used as a configuration reference and as a rollback artifact. The cpaas-system ServiceMonitor, ServiceAccount, and ClusterRoleBinding backups are only needed if you later roll back integrations that depend on those v1 resources. When you rebuild resources on v2, follow the v2 conventions described in Installing Alauda Build of OpenTelemetry v2 and adjust as needed.
Preparing the Java agent image
In v1, Alauda ships a customized Java auto-instrumentation image with the Operator and the Operator injects it automatically; users normally leave Instrumentation.spec.java.image unset. In v2, the Operator no longer ships a Java agent image, and you must set spec.java.image explicitly on every Instrumentation resource that targets Java workloads. See Java Auto-instrumentation for details.
The OpenTelemetry Java agent has moved from the 1.x series to the 2.x series. Some auto-generated metric names and attributes are different from what v1 produced. If your dashboards or alerts depend on specific metric names, review the changes in the upstream Java agent release notes and update them accordingly.
Check Collector configuration compatibility
Alauda Build of OpenTelemetry v2 supports the components listed in the v2.0.0 Release Notes. Review every receiver, processor, exporter, connector, and extension referenced in your existing OpenTelemetryCollector resources and confirm that:
- Each component is included in the v2 supported component lists.
- The configuration syntax matches the upstream
0.147.0schema. Some fields have changed across the upstream release range. For example, thespec.config.service.telemetry.metricsconfiguration shape differs between the two versions.
If you have a staging environment, applying your v1 configuration to a freshly installed v2 Operator there is a good way to surface incompatibilities before the production migration.
Migration procedure
Delete the v1 Instrumentation resources
-
List the existing
Instrumentationresources: -
Delete each
Instrumentationresource. Replace<namespace>and<name>with the values from the previous step:TIPDeleting
Instrumentationdoes not affect application pods that have already been mutated by the webhook — the previously injected init container, environment variables, andJAVA_TOOL_OPTIONSremain in the running pods. The deletion only prevents the v1 Operator from injecting them into newly created pods.
Delete the v1 OpenTelemetryCollector resources
-
List the existing
OpenTelemetryCollectorresources: -
Delete each
OpenTelemetryCollectorresource:
After this step, the OTLP, Jaeger, and Zipkin endpoints exposed by the v1 Collector are gone. Application pods that continue to export telemetry will see export errors until the v2 Collector is created in Step 5.
Uninstall the v1 Operator
-
Delete the
Subscription: -
Delete the RBAC and monitoring resources that the v1 deployment created in
cpaas-system(skip this step if your v1 deployment did not create them). These resources are backed up in Back up v1 resources for rollback. -
Wait until no v1 Operator CSV remains in the cluster. This check is important — if any v1 CSV is still present, OLM rejects the v2 Operator installation in Step 4 due to a CRD owner conflict.
The expected output is empty.
Do not delete the opentelemetrycollectors.opentelemetry.io and instrumentations.opentelemetry.io CRDs. The v2 Operator adopts and upgrades these CRDs when it is installed. Keeping them also allows you to roll back to v1 from the backup files captured in Back up v1 resources.
Install the v2 Operator
Follow Installing the Alauda Build of OpenTelemetry v2 Operator. The condensed CLI flow is:
-
Confirm the available v2 Operator versions:
-
Create the Operator namespace:
-
Create the
Subscription. ReplacestartingCSVwith the version returned in step 1. -
Approve the
InstallPlan: -
Wait for the v2 CSV to reach
Succeeded:
Recreate the OpenTelemetryCollector resources
What this migration changes
When you rebuild the OpenTelemetryCollector manifests from your v1 backup, the following aspects must be adjusted before they can be applied on v2.
-
Collector namespace. The Collector namespace must be different from the Operator namespace (
opentelemetry-operator2). Choose the namespace based on your deployment scenario:- Standalone Collector: a dedicated namespace such as
opentelemetry-collector. - Alauda Container Platform Tracing integration: keep using the same Collector namespace (typically
cpaas-system) so that downstream services that reference the Collector service do not need to change. - Alauda Service Mesh v2 integration: keep the Collector in
istio-systemso the existingIstiomeshConfig.extensionProviders[].opentelemetry.serviceremains valid.
- Standalone Collector: a dedicated namespace such as
-
Component compatibility. Every component used in
spec.configmust be supported on v2. For the recommended Collector configuration when integrating withAlauda Distributed Tracing, see Deploying the OpenTelemetry Collector in the Alauda Distributed Tracing documentation. For other scenarios, follow Deploying the OpenTelemetry Collector and adapt the example to your environment. -
Feature gates. v1 Collectors often pass Collector feature gates via
spec.args.feature-gates. Many of those gates were either stabilized (and therefore no longer toggleable) or removed entirely in newer Collector versions, so reusing the v1 list can prevent the v2 Collector pod from starting. Stripspec.args.feature-gatesfrom the backup and reintroduce only the gates that the v2 Collector version in use explicitly documents. -
Internal metrics Prometheus endpoint. The
service.telemetry.metrics.addressfield is no longer the supported way to expose the internal metrics Prometheus endpoint. Configure it underservice.telemetry.metrics.readers[].pull.exporter.prometheusinstead, as described in the OpenTelemetry Collector internal telemetry documentation. A typical v1 backup looks like: -
Internal metrics verbosity.
level: detailedenables histogram buckets and per-instance labels for the Collector's own internal metrics, which significantly inflates Prometheus cardinality and storage cost — especially in Gateway-mode deployments with many receiver/exporter instances. The defaultlevel: normalis recommended for production: it still exposes process resource usage and per-component sent/received/refused/dropped counters, which is sufficient for most SRE alerting and capacity needs. Switch back todetailedonly temporarily when investigating exporter latency distributions or batch sizing. -
Server-managed metadata. Fields written by the API server (
metadata.creationTimestamp,metadata.resourceVersion,metadata.uid,metadata.generation,metadata.managedFields,metadata.finalizers, thekubectl.kubernetes.io/last-applied-configurationannotation, andstatus) cannot be reused on create and must be stripped from the backup. -
Operator-managed RBAC and Prometheus scraping. The v2 Operator automatically creates the
ServiceAccountandClusterRoleBindingresources required by the Collector. Drop the v1spec.serviceAccountfield from the backup so the Operator can provision a fresh ServiceAccount with the correct permissions; you generally do not have to recreate the v1 RBAC resources by hand. To have the Operator also create aServiceMonitorfor the internal Prometheus endpoint, setspec.observability.metrics.enableMetrics: trueand add a discovery label (such asprometheus: kube-prometheus) tometadata.labelsso that your Prometheus Operator instance picks the resource up. If a Collector component requires additional cluster-level RBAC (for example, thek8sattributesprocessor or thek8sobjectsreceiver), follow Creating the Required RBAC Resources Automatically.
Migration procedure
-
Recreate the
OpenTelemetryCollectorresource from the backup. The following example copies./otel-v1-backup/collectors.yamlinto a new working directory, strips server-managed metadata, removes the v1spec.serviceAccountandspec.args.feature-gatesfields, downgradeslevel: detailedto the defaultlevel: normal, replaces the deprecatedaddressfield with the newreadersconfiguration, enables Operator-managed metrics scraping by settingspec.observability.metrics.enableMetrics: trueand adding theprometheus: kube-prometheuslabel so that the kube-prometheus stack picks up the auto-createdServiceMonitor, and applies the result.jqdoes not read YAML directly, so the example useskubectl patch --local -p='[]' -o jsononly as a local YAML-to-JSON decoder before passing the resources tojq. -
Wait for the Collector pods to become ready:
Recreate the Instrumentation resources
For each Instrumentation resource that you backed up, recreate it on v2 with the new spec.java.image field set. The exporter endpoint and other environment variables follow the same shape used in v1, but Java auto-instrumentation now uses the autoinstrumentation-java 2.x image. In this version, the default OTLP exporter protocol is http/protobuf, so endpoints that previously pointed to the Collector gRPC port 4317 must be changed to the Collector HTTP port 4318 unless you explicitly configure OTEL_EXPORTER_OTLP_PROTOCOL=grpc. Update the host value as well if the Collector namespace or service name has changed.
Use the same working directory created in the previous step. The following example copies ./otel-v1-backup/instrumentations.yaml, sets spec.java.image from the JAVA_AUTO_INSTRUMENTATION_IMAGE variable, changes the backed-up OTEL_EXPORTER_OTLP_ENDPOINT value from port 4317 to 4318, and creates the Instrumentation resources:
- Set
JAVA_AUTO_INSTRUMENTATION_IMAGEto the image you prepared in Preparing the Java agent image. The command writes this value tospec.java.image. Without this field, no Java agent is injected and Java workloads will not be instrumented. autoinstrumentation-java2.x exports withhttp/protobufby default, so the endpoint must use the Collector OTLP HTTP receiver, typically port4318. If you intentionally keep the gRPC receiver on port4317, addOTEL_EXPORTER_OTLP_PROTOCOL=grpcto the Java environment configuration.
Roll out the application pods
Application pods that were previously instrumented by the v1 Operator still carry the v1 init container, agent path, and JAVA_TOOL_OPTIONS. Because the Collector backing those pods has been replaced, telemetry export from those pods is no longer functional. Roll out the affected workloads so that the v2 mutating webhook injects the new Java agent image and environment variables.
-
List the deployments that opt in to Java auto-instrumentation:
-
Restart each instrumented deployment and wait for the rollout to complete. Pick one of the two approaches below based on how cautiously you need to validate.
Option A — One deployment at a time. Run the rollout against each Deployment individually. For large fleets, restart deployments in waves ordered by criticality so you can pause and validate after each wave.
Option B — All instrumented deployments in one command. Iterate over every Deployment that opts in to Java auto-instrumentation. This is faster but offers no built-in pause point, so prefer it for small fleets or after you have validated the change on a canary.
Verify the migration
-
Verify that only the v2 Operator CSV exists and has reached the
Succeededphase: -
Verify the v2 Operator workloads are running:
-
Verify the
OpenTelemetryCollectorresources are healthy and report the v2 version: -
Verify the
Instrumentationresources havespec.java.imageconfigured: -
Verify that an instrumented application pod uses the new Java agent init container:
The output must include the image configured in
Instrumentation.spec.java.image. -
Verify that OpenTelemetry environment variables are present in the pod spec. This checks the Kubernetes object directly and does not require the application container to support
kubectl execor include theenvcommand. -
Send a test request to an instrumented application and confirm that the resulting traces and metrics appear in your tracing backend (for example, Jaeger UI or the platform Tracing console) and Prometheus.
Rollback
If a problem is discovered during or shortly after the migration, you can roll back to the v1 deployment. The same OLM CRD-ownership constraint applies in reverse: you must fully uninstall the v2 Operator before reinstalling v1.
Delete the v2 resources
Wait for the v2 Operator to be fully removed
The expected output is empty.
Reinstall the v1 Operator
Follow the v1 installation procedure documented in Installing the OpenTelemetry Operator.
Recreate the v1 resources from the backup
Recreate the OpenTelemetryCollector, Instrumentation, ServiceAccount, ClusterRoleBinding, and ServiceMonitor resources from the YAML files captured in Back up v1 resources.
Roll out the application pods again
Restart the workloads with kubectl rollout restart so that the v1 mutating webhook re-injects the v1 Java agent.
Troubleshooting
For deeper troubleshooting of the auto-instrumentation flow, see Troubleshooting the instrumentation.