Configuring VerticalPodAutoscaler (VPA)

For both stateless and stateful applications, VerticalPodAutoscaler (VPA) automatically recommends and optionally applies more appropriate CPU and memory resource limits based on your business needs, ensuring that pods have sufficient resources while improving cluster resource utilization.

TOC

Understanding VerticalPodAutoscalers

You can create a VerticalPodAutoscaler to recommend or automatically update the CPU and memory resource requests and limits for your pods based on their historical usage patterns.

After you create a VerticalPodAutoscaler, the platform begins to monitor the CPU and memory resource usage of the pods. When sufficient data is available, the VerticalPodAutoscaler calculates recommended resource values based on the observed usage patterns. Depending on the configured update mode, VPA can either automatically apply these recommendations or simply make them available for manual application.

The VPA works by analyzing the resource usage of your pods over time and making recommendations based on this analysis. It can help ensure that your pods have the resources they need without over-provisioning, which can lead to more efficient resource utilization across your cluster.

How Does the VPA Work?

The VerticalPodAutoscaler (VPA) extends the concept of pod resource optimization. The VPA monitors the resource usage of your pods and provides recommendations for CPU and memory requests based on the observed usage patterns.

The VPA works by continuously monitoring the resource usage of your pods and updating its recommendations as new data becomes available. The VPA can operate in the following modes:

  • Off: VPA only provides recommendations without automatically applying them.
  • Manual Adjustment: You can manually adjust resource configurations based on VPA recommendations.

Important: Elastic scaling can achieve horizontal or vertical scaling of Pods. When sufficient resources are available, elastic scaling can bring good results, but when cluster resources are insufficient, it may cause Pods to be in a Pending state. Therefore, please ensure that the cluster has sufficient resources or reasonable quotas, or you can configure alerts to monitor scaling conditions.

Supported Features

The VerticalPodAutoscaler provides resource recommendations based on historical usage patterns, allowing you to optimize your pod's CPU and memory configurations.

Important: When manually applying VPA recommendations, pod recreation will occur, which can cause temporary disruption to your application. Consider applying recommendations during maintenance windows for production workloads.

Prerequisites

  • Please ensure that the monitoring components are deployed in the current cluster and are functioning properly. You can check the deployment and health status of the monitoring components by clicking on the top right corner of the platform expand > Platform Health Status..
  • The Alauda Container Platform Vertical Pod Autoscaler cluster plugin must be installed in your cluster.

Installing the Vertical Pod Autoscaler Plugin

Before using VPA, you need to install the Vertical Pod Autoscaler cluster plugin:

  1. Log in and navigate to the Administrators page.

  2. Click Marketplace > Cluster Plugins to access the Cluster Plugins list page.

  3. Locate the Alauda Container Platform Vertical Pod Autoscaler cluster plugin, click Install, then proceed to the installation page.

Creating a VerticalPodAutoscaler

Using the CLI

You can create a VerticalPodAutoscaler using the command line interface by defining a YAML file and using the kubectl create command. The following example shows vertical pod autoscaling for a Deployment object:

  1. Create a YAML file named vpa.yaml with the following content:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-deployment-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      mode: "Auto"
  1. Use the autoscaling.k8s.io/v1 API.
  2. The name of the VPA
  3. Specify the target workload object. VPA uses the workload's selector to find pods that need resource adjustment. Supported workload types include DaemonSet, Deployment, ReplicaSet, StatefulSet, ReplicationController, Job, and CronJob.
  4. Specify the API version of the object to scale.
  5. Specify the type of object.
  6. The target resource to which the VPA applies
  7. Update policy that defines how VPA applies recommendations. The updateMode can be:
    • Auto: Automatically sets resource requests when creating pods and updates current pods to recommended resource requests. Currently equivalent to "Recreate". This mode may cause application downtime. Once in-place pod resource updates are supported, "Auto" mode will adopt this update mechanism.
    • Recreate: Automatically sets resource requests when creating pods and evicts current pods to update to recommended resource requests. Will not use in-place updates.
    • Initial: Only sets resource requests when creating pods, no modifications afterward.
    • Off: Does not automatically modify pod resource requests, only provides recommendations in the VPA object.
  8. Resource policy that can set specific strategies for different containers. For example, setting a container's mode to "Auto" means it will calculate recommendations for that container, while "Off" means it won't calculate recommendations.
  9. Apply policy to all containers in the pod.
  10. Set the mode to Auto or Off. Auto means recommendations will be generated for this container, Off means no recommendations will be generated.
  1. Apply the YAML file to create the VPA:
$ kubectl create -f vpa.yaml

Example output:

verticalpodautoscaler.autoscaling.k8s.io/my-deployment-vpa created
  1. After you create the VPA, you can view the recommendations by running the following command:
$ kubectl describe vpa my-deployment-vpa

Example output (partial):

Status:
  Recommendation:
    Container Recommendations:
      Container Name:  my-container
      Lower Bound:
        Cpu:     100m
        Memory:  262144k
      Target:
        Cpu:     200m
        Memory:  524288k
      Upper Bound:
        Cpu:     300m
        Memory:  786432k

Using the Web Console

  1. Enter Container Platform.

  2. In the left navigation bar, click Workloads > Deployments.

  3. Click on Deployment Name.

  4. Scroll down to the Elastic Scaling area and click Update on the right.

  5. Select Vertical Scaling and configure the scaling rules.

    ParameterDescription
    Scaling ModeCurrently supports Manual Scaling mode, which provides recommended resource configurations by analyzing past resource usage. You can manually adjust according to the recommended values. Adjustments will cause pods to be recreated and restarted, so please choose an appropriate time to avoid impacting running applications.
    Typically, after pods have been running for more than 8 days, the recommended values will become accurate.
    Note that when cluster resources are insufficient, scaling may cause Pods to be in a Pending state. Please ensure that the cluster has sufficient resources or reasonable quotas, or configure alerts to monitor scaling conditions.
    Target ContainerDefaults to the first container of the workload. You can choose to enable resource limit recommendations for one or more containers as needed.
  6. Click Update.

Advanced VPA Configuration

Update Policy Options

  • updateMode: "Off" - VPA only provides recommendations without automatically applying them. You can manually apply these recommendations as needed.
  • updateMode: "Auto" - Automatically sets resource requests when creating pods and updates current pods to recommended values. Currently equivalent to "Recreate".
  • updateMode: "Recreate" - Automatically sets resource requests when creating pods and evicts current pods to update to recommended values.
  • updateMode: "Initial" - Only sets resource requests when creating pods, no modifications afterward.
  • minReplicas: <number> - Minimum number of replicas. Ensures this minimum number of pods remain available when the Updater evicts pods. Must be greater than 0.

Container Policy Options

  • containerName: "*" - Apply policy to all containers in the pod.
  • mode: "Auto" - Automatically generate recommendations for the container.
  • mode: "Off" - Do not generate recommendations for the container.

Notes:

  • VPA recommendations are based on historical usage data, so it may take several days of pod operation before recommendations become accurate.
  • Pod recreation will occur when VPA recommendations are applied in Auto mode, which can cause temporary disruption to your application.

Follow-Up Actions

After configuring VPA, the recommended values for CPU and memory resource limits of the target container can be viewed in the Elastic Scaling area. In the Containers area, select the target container tab and click the icon on the right side of Resource Limits to update the resource limits according to the recommended values.