This guide describes how to prevent Alauda Container Platform nodes from running out of memory (OOM) or disk space. Stable node operation is critical, especially for non-compressible resources like memory and disk. Resource exhaustion can lead to node instability.
Administrators can configure eviction policies to monitor nodes and reclaim resources before stability is compromised.
This document covers how Alauda Container Platform handles out-of-resource scenarios, including resource reclamation, pod eviction, pod scheduling, and the Out of Memory Killer. Example configurations and best practices are also provided.
If swap memory is enabled on a node, memory pressure cannot be detected. Disable swap to enable memory-based evictions.
Eviction policies allow nodes to terminate pods when resources are low, reclaiming needed resources. Policies combine eviction signals and threshold values, set in the node configuration or via command line. Evictions can be:
Properly configured eviction policies help nodes proactively prevent resource exhaustion.
When a pod is evicted, all containers in the pod are terminated, and the PodPhase transitions to Failed.
For disk pressure, nodes monitor both nodefs
(root filesystem) and imagefs
(container image storage).
/var/lib/kubelet
)./var/lib/docker/overlay2
for Docker overlay2 driver, /var/lib/containers/storage
for CRI-O).Without local storage isolation (ephemeral storage) or XFS quota (volumeConfig), pod disk usage cannot be limited.
To set eviction thresholds, edit the node configuration map under eviction-hard
or eviction-soft
.
Hard Eviction Example:
eviction-hard
for hard eviction thresholds.<eviction_signal><operator><quantity>
, such as memory.available<500Mi
or nodefs.available<10%
.Use percentage values for inodesFree
. Other parameters accept percentages or numeric values.
Soft Eviction Example:
eviction-soft
for soft eviction thresholds.<eviction_signal><operator><quantity>
, such as memory.available<500Mi
or nodefs.available<10%
.Restart the kubelet service for changes to take effect:
Nodes can trigger evictions based on the following signals:
Node Condition | Eviction Signal | Description |
---|---|---|
MemoryPressure | memory.available | Available memory below threshold |
DiskPressure | nodefs.available | Node root filesystem space below threshold |
nodefs.inodesFree | Free inodes below threshold | |
imagefs.available | Image filesystem space below threshold | |
imagefs.inodesFree | Free inodes in imagefs below threshold |
inodesFree
must be specified as a percentage.free -m
in containers.Nodes monitor these filesystems every 10 seconds. Dedicated filesystems for volumes/logs are not monitored.
Before evicting pods due to disk pressure, nodes perform container and image garbage collection.
Eviction thresholds trigger resource reclamation. When a threshold is met, the node reports a pressure condition, preventing new pods from being scheduled until resources are reclaimed.
Thresholds are configured as:
Example:
memory.available<1Gi
memory.available<10%
Nodes evaluate thresholds every 10 seconds.
No grace period; immediate action is taken.
Example:
Soft thresholds require a grace period. Optionally, set a maximum pod termination grace period (eviction-max-pod-grace-period
).
Example:
Control how much node resource is available for scheduling by setting system-reserved
for system daemons. Evictions occur only if pods exceed their requested resources.
Example:
Determine appropriate values using the node summary API.
Restart the kubelet for changes:
To avoid oscillation above/below soft eviction thresholds, set eviction-pressure-transition-period
:
Example:
Default is 5 minutes. Restart services for changes.
When eviction criteria are met, nodes reclaim resources before evicting user pods.
nodefs
threshold is met: Delete dead pods/containers.imagefs
threshold is met: Delete unused images.nodefs
threshold is met: Delete dead pods/containers, then unused images.If a threshold and grace period are met, pods are evicted until the signal is below the threshold.
Pods are ranked for eviction by quality of service (QoS) and resource consumption.
QoS Level | Description |
---|---|
Guaranteed | Highest resource consumers evicted first. |
Burstable | Highest resource consumers relative to request evicted first. |
BestEffort | Highest resource consumers evicted first. |
Guaranteed pods are only evicted if system daemons exceed reserved resources or only guaranteed pods remain.
Disk is a best-effort resource; pods are evicted one at a time to reclaim disk space, ranked by QoS and disk usage.
If a system OOM event occurs before memory can be reclaimed, the OOM killer responds.
OOM scores are set based on QoS:
QoS Level | oom_score_adj Value |
---|---|
Guaranteed | -998 |
Burstable | min(max(2, 1000 - (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999) |
BestEffort | 1000 |
OOM killer ends the container with the highest score. Containers with lowest QoS and highest memory usage are ended first. Containers may be restarted per node policy.
Scheduler considers node conditions when placing pods.
Node Condition | Scheduler Behavior |
---|---|
MemoryPressure | BestEffort pods not scheduled. |
DiskPressure | No additional pods scheduled. |
Operator wants:
Calculation:
capacity = 10Gi
system-reserved = 1Gi
allocatable = 9Gi
To trigger eviction below 10% available memory for 30s, or immediately below 5%:
system-reserved = 2Gi
allocatable = 8Gi
Configuration:
This prevents immediate memory pressure and eviction after scheduling.
Pods created by daemon sets are immediately recreated if evicted. Daemon sets should avoid best-effort pods and use guaranteed QoS to reduce eviction risk.