Mount the following partitions on dedicated disks or on LVM-provisioned logical volumes so they can be expanded later.
partition | Minimum size | Recommended size | Notes |
---|---|---|---|
/var/lib/etcd | 10GB | 20GB | A dedicated high-IO disk is recommended for hosting etcd data. |
/var/lib/containerd/ | 100GB | 150GB | |
/cpaas/ | For control plane nodes of global cluster, at least 100GB; For other nodes, at least 40GB | 200GB | Plan for additional space if you expect the infra node components which requires more space on /cpaas/. |
/ | 50GB | 100GB, higher is better. | Ensure there is enough free disk space to keep utilization below 80%. If usage rises above this threshold, pods on the node may be evicted. |
arbitrary location for downloading and unpacking the installer packages, extensions and so on. | 20GB | 250GB | Actual storage needs will vary depending on which extensions you plan to install. Plan for additional space if you expect to add more components or enable extra features later. |
Fast storage is essential for etcd to perform reliably. etcd depends on durable, low-latency disk operations to persist proposals to its write-ahead log (WAL).
If disk writes take too long, fsync delays can cause the member to miss heartbeats, fail to commit proposals promptly, and experience request timeouts or temporary leader changes. These issues can also slow the Kubernetes API and degrade overall cluster responsiveness.
In conclusion, HDDs are a poor choice and are not recommended. If you must use HDDs for etcd, choose the fastest available (for example, 15,000 RPM).
The following hard drive practices provide optimal etcd performance:
Prefer SSDs or NVMe as etcd drives. When write endurance and stability are priorities, consider server-grade single-level cell (SLC) SSDs. Avoid NAS, SAN, and HDDs.
Avoid distributed block storage systems such as Ceph RADOS Block Device (RBD), Network File System (NFS), and other network-attached backends, because they introduce unpredictable latency.
Keep etcd data on a dedicated drive or a dedicated logical volume.
Continuously benchmark with tools like fio
and use the results to track performance as the cluster grows. Refer to the disk benchmarking guide for more information.
Specification | Minimum Requirement | Recommended | Notes |
---|---|---|---|
Sequential write IOPS | 50 | 500 (higher is better) | Most cloud providers publish concurrent IOPS rather than sequential IOPS. The concurrent IOPS values are typically about 10× higher than sequential ones. |
Disk bandwidth | 10 MB/s | 100 MB/s (higher is better) | Higher disk bandwidth allows faster data recovery when a failed member needs to catch up with the cluster. |
Throughput (sequential 8 kB write with fdatasync ) | 50 writes per 10 ms | 500 writes per 2 ms | Reflects sustained write throughput when data is flushed to disk after each write operation. |
To measure actual sequential IOPS and throughput, we suggest using the disk benchmarking tool fio
. You may refer to the following instructions:
Do not run these tests against any nodes of the clusters.
Instead, run the tests against a dedicated VM that has the same set up as the control plane nodes.