Harbor Image Synchronization to Cluster Registry

This guide explains how to configure Harbor replication rules to automatically synchronize images from Harbor to registries in clusters when new images are pushed.

TOC

Prerequisites

  • A running Harbor service
  • podman client or other image management tools installed locally for pushing images to Harbor
  • A Kubernetes cluster with a registry deployed
  • Kubectl installed locally and configured to access the Kubernetes cluster

Overview

Key Configurations in Harbor

  • Registry Endpoint: Add target registry information to Harbor
  • Replication Rule: Define which images to sync and the synchronization path

Process Overview

StepOperationDescription
1Configure Registry EndpointAdd target cluster registry endpoint to Harbor
2Configure Replication RuleDefine image synchronization policies
3Verify Image SynchronizationTest the synchronization workflow

Configuration Steps

Step 1: Configure Registry Endpoint in Harbor

First, configure the target registry information in Harbor using the Registry Endpoint feature.

  1. Log into Harbor and navigate to Administration > Registries
  2. Click NEW ENDPOINT and configure the following:
  • Provider: Select your registry type. For this example, choose Harbor
  • Name: Provide a descriptive name for your target registry. We recommend using the format <cluster-name>-registry for easy identification in this example.
  • Endpoint URL: Enter your registry URL (e.g., https://cluster1-registry.example.com)
  • Access ID: Username with Push/Delete permissions for the target registry
  • Access Secret: Password for the above user
  • Verify Remote Cert: Determines whether to verify the remote registry's certificate. Uncheck for self-signed or untrusted certificates
  1. Click TEST CONNECTION to verify configuration and network connectivity
  2. Click OK to save the configuration

Step 2: Configure Replication Rule in Harbor

Next, configure a replication rule to define how Harbor synchronizes images to the target registry.

  1. Navigate to Administration > Replications
  2. Click NEW REPLICATION RULE and configure:

Basic Settings

  • Name: Use a descriptive name like <cluster-name>-registry-replication
  • Replication mode: Select Push-based to trigger sync when images are pushed to Harbor

Source Resource Filter

Configure filters to identify which artifacts to replicate:

  • Name: Filter by resource name. Leave empty or use ** to match all. For this example, leave empty to sync all repositories
  • Tag: Filter by tag/version. Leave empty or use ** to match all tags. For this example, leave empty to sync all tags
  • Label: Filter by artifact labels. Leave empty to match all artifacts. For this example, leave empty to sync all artifacts
  • Resource: Filter the type of resources. For this example, select ALL to sync all artifact types

Destination

Define the sync target and path structure:

  • Destination registry: Select the registry endpoint configured in Step 1
  • Namespace: The name of the namespace in which to replicate resources. If empty, the resources will be put under the same namespace as the source. For this example, leave empty to use the same namespace as the source
  • Flattening: Reduce the nested repository structure when copying images. Leave empty to preserve the original image hierarchy

Additional Settings

  • Trigger Mode: Decide how to trigger the sync. Select Event Based to trigger sync on Harbor push events for this example
  • Check "Delete remote resources when locally deleted" if you want deletions in Harbor to propagate to the target registry
  • Bandwidth: Set maximum network bandwidth per replication worker (use -1 for unlimited)
  • Options:
  • Check Override to overwrite existing resources at the destination
  • Check Enable rule to activate the replication rule

Step 3: Verify Image Synchronization

Push an Image to Harbor

Push an image to Harbor to trigger the sync:

podman pull alpine:latest
podman tag alpine:latest <your-harbor-address>/library/alpine:latest
podman push <your-harbor-address>/library/alpine:latest

Monitor Synchronization Job in Harbor

  1. In Harbor, navigate to Administration > Replications
  2. Select your newly created replication rule to see the automatically triggered sync job
  3. Click on the execution record to view detailed information

Verify Synchronization Results

Create a pod in the target cluster to test that the image was successfully synchronized:

kubectl run alpine-test --image=<your-cluster-registry-address>/library/alpine:latest -- sleep 3600

If the pod starts successfully, the synchronization is working correctly.

Summary

This configuration establishes automatic image synchronization from Harbor to cluster registries, maintaining consistent image paths and enabling seamless deployment workflows across your infrastructure. The push-based replication ensures that images are available in target registries immediately after being pushed to Harbor.

FAQ

Can the Replication Function Be Used for Harbor Disaster Recovery?

Harbor's Replication function can only provide a preliminary disaster recovery (DR) solution and comes with certain limitations.

  • Firstly, a failover mechanism is required to automatically switch the domain name resolution from the primary instance to the secondary instance in the event of a primary instance failure, allowing users to continue pulling already synchronized images.
  • Secondly, this function relies on event-driven synchronization of images and is primarily designed for replication between registries, not as a mature solution tailored for disaster recovery.

The following sections analyze the problems this solution can address and those it cannot.

Problems Addressed

  • Automatic Synchronization of Image Data: Container images and Helm charts from the primary instance can be automatically synchronized to the secondary instance. When the "synchronize deletions" option is enabled, images deleted from the primary instance are automatically removed from the secondary instance.
  • Automatic Failover: In the event of a primary instance failure, the external domain name resolution can automatically switch to the secondary instance, enabling users to pull already synchronized images (Recovery Time Objective, or RTO, depends primarily on the implemented failover mechanism).

Problems Not Addressed

  1. Automatic Synchronization of Non-Image Data (Critical!):
  • Issue Description: Harbor's Replication function does not synchronize metadata such as user accounts, permission configurations (e.g., OIDC settings), project member groups, or audit logs.
  • Impact: Users whose metadata is not synchronized will be unable to access Harbor after failover, significantly undermining the disaster recovery objectives.
  1. Recovery Point Objective (RPO):
  • Issue Description: The Replication function is not designed specifically for disaster recovery. Image replication depends on background task execution, which cannot guarantee that all images in the primary registry are synchronized to the secondary registry at the time of a failure.
  • Impact: During a primary registry failure, users will be unable to pull images that have not yet been synchronized.
  1. Fallback to the Primary Instance:
  • Issue Description: The current solution only supports automatic failover to the secondary instance. Fallback to the primary instance requires manual intervention.
  • Impact: Before fallback, incremental data from the secondary instance must be synchronized back to the primary instance, or this data will be lost.

Recommendations for a Comprehensive DR Solution

To achieve a robust disaster recovery mechanism, focus on synchronization at the PostgreSQL database (PG) and storage layers. For instance, implement real-time database replication for PostgreSQL and enable storage-layer synchronization (e.g., S3 cross-region replication) to ensure comprehensive consistency of both metadata and all data.

References