Configuring OSD WAL and DB Partitions
TOC
IntroductionScenariosPrerequisitesConstraints and LimitationsPlan WAL and DB CapacityProcedureVerificationIntroduction
This topic describes how to configure a fast metadata partition for host-based Rook-Ceph Object Storage Daemons (OSDs). In this configuration, the OSD stores user data on a data device, and BlueStore metadata such as the RocksDB database and write-ahead log (WAL) on a faster device or partition.
Use this procedure when HDD-backed OSDs need lower latency for metadata operations and the storage node has an SSD or NVMe device reserved for OSD metadata.
Scenarios
- Use an SSD or NVMe device as the shared metadata device for all selected OSD data devices on a node.
- Use one metadata partition for one specific OSD data device.
- Plan metadata partition capacity for newly created or re-created host-based OSDs.
Prerequisites
Before you begin, ensure the following conditions are met:
- You have
cluster-adminaccess to the cluster. - You have shell access to each storage node where you need to inspect or prepare local devices.
- The OSD data device and metadata device or partition do not contain mounted file systems or application data.
- The
rook-ceph-toolsdeployment is available in therook-cephnamespace, or you are allowed to start it temporarily.
Constraints and Limitations
Changing metadataDevice, databaseSizeMB, or walSizeMB does not move the WAL or DB of an existing OSD in place. To change the WAL/DB layout of an existing OSD, remove and re-create that OSD after confirming the cluster has enough capacity and healthy placement groups.
- Use stable device paths such as
/dev/disk/by-id/...when possible. Linux names such as/dev/sdbcan change after a reboot or hardware replacement. - When
metadataDeviceis configured atspec.storage.configorspec.storage.nodes[].config, Alauda Build of Rook-Ceph shares that metadata device across OSDs on the same node and initializes OSDs withlvm batch. In this mode, do not use a partition path asmetadataDevice. - When
metadataDeviceis configured under a specificdevices[].config, Alauda Build of Rook-Ceph initializes the OSD withlvm prepare. In this mode, you can use a partition path as the metadata device for that specific OSD.
Plan WAL and DB Capacity
BlueStore can use a separate block.db device for RocksDB metadata. If a DB device is configured and no explicit WAL device is configured, BlueStore colocates the WAL on the DB device. For this reason, when you assign one dedicated metadata partition to one HDD OSD, size the partition as the DB and WAL capacity for that OSD. You usually do not need to set databaseSizeMB or walSizeMB in the CephCluster CR for this pattern.
Use the following guidelines to size each metadata partition:
Examples:
Treat the RGW values in the table as minimum planning baselines, not as upper limits. If the fast device has enough capacity, allocate more than 4% to reduce the chance that BlueStore metadata spills back to the HDD.
When a whole fast device is shared by multiple HDD OSDs at the node level, calculate the total required fast-device capacity as:
If the shared metadata device is smaller than the calculated capacity, reduce the number of HDD OSDs that share the fast device or use a smaller per-OSD target based on the workload. Avoid relying on a very small DB device, because BlueStore metadata can spill back to the HDD when the DB device is full.
Procedure
Check host device paths
Log in to each storage node and list the available block devices.
List stable device links.
Record the data device and metadata device paths for each OSD. For example:
Configure one metadata partition for one OSD
Use device-level metadataDevice when the metadata target is a partition. This pattern maps each OSD data device to its own WAL/DB partition. The partition capacity is the metadata capacity for that OSD, so databaseSizeMB and walSizeMB are not required in this configuration.
Update spec.storage with a configuration similar to the following example.
In this example, each HDD data device uses a different NVMe partition for its BlueStore DB. The WAL is colocated with the DB on the same metadata partition.
Configure a shared metadata device for OSDs on a node
Use node-level metadataDevice only when the metadata target is a whole device or logical volume. Rook shares the metadata target across the selected OSD data devices on that node. Set databaseSizeMB only when you need to cap the DB size that Rook allocates per OSD from the shared metadata device. In most deployments, do not set walSizeMB; the WAL can be colocated with the DB.
Use this pattern when the whole fast device is reserved for OSD metadata on the node.
Wait for OSD prepare jobs to finish
Watch the OSD prepare jobs and OSD pods.
If an OSD prepare job fails, inspect the job log.
Verification
Start the tools pod if it is not running.
Check the cluster health.
Confirm that the OSDs are up and assigned to the expected hosts.
To inspect the local BlueStore metadata layout, run ceph-volume from the OSD pod or on the storage node where the OSD was prepared.
In the output, confirm that the OSD has block.db and, when configured separately by Ceph, block.wal entries that point to the expected metadata device or partition.