Endpoint Health Checker

TOC

Overview

The Endpoint Health Checker is a cluster plugin designed to monitor and manage the health status of service endpoints. It automatically removes unhealthy endpoints from load balancers to ensure traffic is only routed to healthy instances, improving overall service reliability and availability.

Key Features

  • Automatic Health Monitoring: Continuously monitors the health status of service endpoints
  • Load Balancer Integration: Automatically removes unhealthy endpoints from load balancer rotation
  • Service Availability: Ensures traffic is only directed to healthy, available endpoints

Installation

Install via Marketplace

  1. Navigate to Administrator > Marketplace > Cluster Plugins.

  2. Search for "Alauda Container Platform Endpoint Health Checker" in the plugin list.

  3. Click Install to open the installation configuration page.

  4. Wait for the plugin status to change to "Ready".

How It Works

Health Check Mechanism

The Endpoint Health Checker is a dedicated health monitoring component that ensures only healthy endpoints receive traffic. It operates by monitoring service endpoints and automatically managing their availability status.

Core Functionality

The Endpoint Health Checker works by:

  1. Service Discovery: Identifies services and pods configured for health monitoring
  2. Pod Health Monitoring: Monitors the readiness and liveness probe status of pods backing the service endpoints
  3. Active Health Checks: Performs active health assessments using configurable criteria:
    • TCP connectivity checks: Establishes TCP connections to verify port accessibility
    • HTTP/HTTPS response validation: Sends HTTP requests and validates response codes and content
  4. Endpoint Management: Automatically removes unhealthy endpoints from service endpoint lists to prevent traffic routing to failed instances

Health Check Process

The health checking process involves:

  • Probe Integration: Leverages Kubernetes readiness and liveness probe results as initial health indicators
  • Network Connectivity: Sends TCP or HTTP packets to target endpoint ports to verify accessibility
  • Response Validation: Evaluates response status, timing, and content to determine endpoint health
  • Automatic Failover: Removes unresponsive or failed endpoints from load balancer rotation

Activation Methods

Health checking can be activated through two methods:

  1. Pod-level annotation (Recommended):

    Add the annotation to the pod template in your Deployment:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-demo
    spec:
      replicas: 10
      selector:
        matchLabels:
          app: nginx-demo
      template:
        metadata:
          labels:
            app: nginx-demo
          annotations:
            endpoint-health-checker.io/enabled: "true"
        spec:
          containers:
            - name: nginx
              image: nginx:alpine
              ports:
                - containerPort: 80
              livenessProbe:
                tcpSocket:
                  port: 80
                initialDelaySeconds: 15
                periodSeconds: 10
              readinessProbe:
                tcpSocket:
                  port: 80
                initialDelaySeconds: 5
                periodSeconds: 5
  2. Pod-level readinessGates (Legacy):

    Configure readinessGates in the pod spec for older versions:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-demo-legacy
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: nginx-demo-legacy
      template:
        metadata:
          labels:
            app: nginx-demo-legacy
        spec:
          readinessGates:
            - conditionType: "endpointHealthCheckSuccess"
          containers:
            - name: nginx
              image: nginx:alpine
              ports:
                - containerPort: 80
              livenessProbe:
                tcpSocket:
                  port: 80
                initialDelaySeconds: 15
                periodSeconds: 10
              readinessProbe:
                tcpSocket:
                  port: 80
                initialDelaySeconds: 5
                periodSeconds: 5

    Note: The readinessGates configuration is from an older version. It's recommended to use the pod annotation endpoint-health-checker.io/enabled: "true" for new deployments.

Uninstallation

To uninstall the Endpoint Health Checker:

  1. Navigate to Administrator > Marketplace > Cluster Plugins.

  2. Find the installed "Endpoint Health Checker" plugin.

  3. Click the options menu and select Uninstall.

  4. Confirm the uninstallation when prompted.