Deploy Etcd Cluster
TOC
Overview
This guide demonstrates how to deploy a highly available etcd cluster using the etcd operator. Etcd is a distributed key-value store that provides reliable data storage for distributed systems, commonly used as the backing store for Kubernetes control plane data.
To deploy an etcd cluster, you need to install cluster plugin ** Alauda Build Of etcd Operator **, please refer to Install
Prerequisites
Before deploying an etcd cluster, ensure the following requirements are met:
- A running Kubernetes cluster with the etcd operator installed
- Cert-manager installed and configured in your cluster
- Sufficient resources available for the etcd cluster (CPU, memory, and storage)
- A storage class configured for persistent volumes (e.g.,
sc-topolvm) kubectlCLI tool configured to access your cluster
Certificate Management
Etcd requires TLS certificates for secure communication between cluster members. You can either generate new certificates using cert-manager or use existing certificates.
Parameters used in this section:
{ROOT_CA_NAME}: The name of the root CA certificate and secret (e.g.,etcd-root-ca){CA_ISSUER_NAME}: The name of the cert-manager Issuer (e.g.,etcd-ca-issuer)
Option 1: Generate CA with Cert-Manager
If you don't have an existing CA, you can create one using cert-manager.
Step 1: Generate CA Secret
Create a Certificate resource that will generate the root CA:
Apply this configuration:
Step 2: Create Root CA Issuer
Once the CA secret is created, create an Issuer that references it:
Apply the issuer configuration:
Verify the issuer is ready:
Option 2: Use Existing CA
If you already have a CA certificate stored as a Kubernetes secret, you can create an issuer directly:
Apply the configuration:
Deployment Procedure
Step 1: Determine Etcd Version
First, determine the appropriate etcd version to use. You can check the version used by your global cluster:
This command will output the etcd version, for example: v3.5.21
Step 2: Create EtcdCluster Resource
Create an EtcdCluster custom resource with your desired configuration:
Replace the placeholders:
{ETCD_CLUSTER_NAME}: The name of your etcd cluster{SC_NAME}: The name of your storage class{CA_ISSUER_NAME}: The name of your cert-manager Issuer{ETCD_VERSION}: The etcd version (e.g.,v3.5.21)
Apply the configuration:
Step 3: Monitor Deployment Status
Monitor the StatefulSet creation and rollout:
Wait until all pods are in the Running state and ready (this may take a few minutes).
Verification
After deployment, verify that the etcd cluster is healthy and operational:
Set the following variables:
{ETCD_CLUSTER_NAME}: The name of your etcd cluster (same as specified in the EtcdCluster resource){NAMESPACE}: The namespace where etcd is deployed (e.g.,cpaas-system)
Check Cluster Health
Access one of the etcd pods and check the cluster health:
Check Cluster Members
List all cluster members:
Test Data Operations
Perform a simple write and read test:
Configuration Options
Cluster Size
The size field determines the number of etcd members. For production use:
- 3 members: Standard configuration, tolerates 1 failure
- 5 members: High availability, tolerates 2 failures
- 7 members: Maximum recommended, tolerates 3 failures
Always use an odd number of members to maintain quorum during failures.
Storage Configuration
- storageClassName: Specify your storage class that supports dynamic provisioning
- volumeSizeRequest: Initial size of the persistent volume
- volumeSizeLimit: Maximum size the volume can grow to (if supported by storage class)
TLS Configuration
- validityDuration: Certificate validity period (recommended: 8760h or 1 year)
- Certificates are automatically renewed by cert-manager before expiration
Troubleshooting
Parameters used in this section:
{ETCD_CLUSTER_NAME}: The name of your etcd cluster{POD_NAME}: The name of a specific pod (get fromkubectl get pods){OTHER_POD_IP}: The IP address of another pod (get fromkubectl get pods -o wide){ROOT_CA_NAME}: The name of the root CA certificate{CA_ISSUER_NAME}: The name of the cert-manager Issuer
Pods Not Starting
If pods fail to start, check the following:
Common issues:
- Insufficient storage capacity
- Storage class not found
- Certificate issuer not ready
- Resource constraints (CPU/memory)
Cluster Not Forming Quorum
If the cluster fails to form quorum:
Certificate Issues
Verify certificates are properly issued:
Best Practices
- Backup Strategy: Implement regular etcd backups for disaster recovery
- Monitoring: Set up monitoring and alerting for cluster health
- Resource Limits: Configure appropriate resource requests and limits
- Network Policies: Implement network policies to secure etcd communication
- Regular Maintenance: Keep etcd version updated and monitor storage usage