Configure a Dedicated Cluster for Distributed Storage

Dedicated cluster deployment refers to using an independent cluster to deploy the platform's distributed storage, where other business clusters within the platform access and utilize the storage services it provides through integration.
To ensure the performance and stability of the platform's distributed storage, only the platform's core components and distributed storage components are deployed in the dedicated storage cluster, avoiding the co-location of other business workloads. This separated deployment approach is the recommended best practice for the platform's distributed storage.

Architecture

Storage-Compute Separation Architecture

Infrastructure requirements

Platform requirements

Supported in version 3.18 and later.

Cluster requirements

It is recommended to use bare-metal clusters as dedicated storage clusters.

Resource requirements

Please refer to the Core Concepts for the components of distributed storage deployment.

Each component has distinct CPU and memory requirements. The recommended configurations are as follows:

Process	CPU	Memory
MON	2c	3Gi
MGR	3c	4Gi
MDS	3c	8Gi
RGW	2c	4Gi
OSD	4c	8Gi

A cluster typically runs:

3 MON
2 MGR
multiple OSD
2 MDS (if using CephFS)
2 RGW (if using CephObjectStorage)

Based on the component distribution, the following per-node resource recommendations apply:

CPU	Memory
16c + (4c * OSD per node)	20Gi + (8Gi * OSD per node)

Storage device requirements

It is recommended to deploy 12 or fewer storage devices per node. This helps restrict the recovery time following a node failure.

Storage device type requirements

It is recommended to use enterprise SSDs with a capacity of 10TiB or smaller per device, and ensure all disks are identical in size and type.

Capacity planning

Before deployment, storage capacity must be planned according to specific business requirements. By default, the distributed storage system employs a 3-replica redundancy strategy. Therefore, the usable capacity is calculated by dividing the total raw storage capacity (from all storage devices) by 3.

Example for 30(N) nodes (replica count = 3), The usable capacity scenario is as follows:

Storage device size(D)	Storage device per node(M)	Total Capacity(DMN)	Usable Capacity(DMN/3)
0.5 TiB	3	45 TiB	15 TiB
2 TiB	6	360 TiB	120 TiB
4 TiB	9	1080 TiB	360 TiB

Capacity monitoring and expansion

Proactive Capacity Planning

Always ensure usable storage capacity exceeds consumption. If storage is fully exhausted, recovery requires manual intervention and cannot be resolved by simply deleting or migrating data.
Capacity Alerts

The cluster triggers alerts at two thresholds:
- 80% utilization ("near full"): Proactively free up space or scale out the cluster.
- 95% utilization ("full"): Storage is fully exhausted, and standard commands cannot free space. Contact platform support immediately.
Always address alerts promptly and monitor storage usage regularly to avoid outages.
Scaling Recommendations
- Avoid: Adding storage devices to existing nodes.
- Recommended: Scale out by adding new storage nodes.
- Requirement: New nodes must use storage devices identical in size, type, and quantity to existing nodes.

Network requirements

Distributed storage must utilize HostNetwork.

Network Isolation

The network is categorized into two types:

Public Network: Used for client-to-storage component interactions (e.g., I/O requests).
Cluster Network: Dedicated to data replication between replicas and data rebalancing (e.g., recovery).

To ensure service quality and performance stability:

For Dedicated Storage Clusters:
Reserve two network interfaces on each host:
- Public Network: For client and component communication.
- Cluster Network: For internal replication and rebalancing traffic.
For Business Clusters:
Reserve one network interface on each host to access the storage Public Network.

Example Network Isolation Configuration

Network interface speed requirements

Storage Nodes
- Public Network and Cluster Network require 10GbE or higher network interfaces.
Business Cluster Nodes
- The network interface used to access the storage Public Network must be 10GbE or higher.

Procedure

Deploy Operator

Access Platform Management.
In the left sidebar, click Storage Management > Distributed Storage.
Click Create Now.
In the Deploy Operator wizard page, click the Deploy Operator button at the bottom right.
- When the page automatically advances to the next step, it indicates that the Operator has been deployed successfully.
- If the deployment fails, please refer to the prompt on the interface Clean Up Deployed Information and Retry, and redeploy the Operator; if you wish to return to the distributed storage selection page, click Application Store, first uninstall the resources in the already deployed rook-operator, and then uninstall rook-operator.

Create ceph cluster

Execute commands on the control node of the storage cluster.

Click to view

cat << EOF | kubectl create -f -
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: ceph-cluster
  namespace: rook-ceph
spec:
  cephConfig:
    global:
      mon_memory_target: "3221225472"
      mds_cache_memory_limit: "8589934592"
      osd_memory_target: "8589934592"
      bluefs_buffered_io: "false"
    mon:
      auth_allow_insecure_global_id_reclaim: "true"
      mon_warn_on_insecure_global_id_reclaim: "false"
      mon_warn_on_insecure_global_id_reclaim_allowed: "false"
  cephVersion:
    image: build-harbor.alauda.cn/3rdparty/ceph/ceph:v18.2.4-0
  dashboard:
    enabled: true
  dataDirHostPath: /var/lib/rook
  mgr:
    count: 2
    modules:
    - enabled: true
      name: pg_autoscaler
  mon:
    count: 3
  monitoring:
    enabled: true
  network:
    ipFamily: IPv4
    addressRanges:
      public:
      - <public network cidr>
      cluster:
      - <cluster network cidr>
    provider: host
  placement:
    all:
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "node-role.kubernetes.io/cpaas-system"
        operator: "Exists"
        effect: "NoSchedule"
    mgr:
      podAffinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - rook-ceph-mgr
              topologyKey: kubernetes.io/hostname
  priorityClassNames:
    all: system-node-critical
  resources:
    crashcollector:
      limits:
        cpu: 200m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 64Mi
    mgr:
      requests:
        cpu: "3"
        memory: 4Gi
    mon:
      requests:
        cpu: "2"
        memory: 3Gi
    osd:
      requests:
        cpu: "4"
        memory: 8Gi
  storage:
    <storage devices>
EOF

Parameters:

public network cidr: CIDR of the storage Public Network (e.g., - 10.0.1.0/24).
cluster network cidr: CIDR of the storage Cluster Network (e.g., - 10.0.2.0/24).

storage devices: Specify the storage devices to be utilized by the distributed storage.
Example Formatting:

  nodes:
  - name: storage-node-01
    devices:
    - name: /dev/disk/by-id/wwn-0x5000cca01dd27d60
    useAllDevices: false
  - name: storage-node-02
    devices:
    - name: sdb
    - name: sdc
    useAllDevices: false
  - name: storage-node-03
    devices:
    - name: sdb
    - name: sdc
    useAllDevices: false

Tip

Uses the disk's World Wide Name (WWN) for stable naming, which avoids reliance on volatile device paths like sdb that may change after reboots.

Create storage pools

Three storage pool types are available. Select and create the appropriate ones based on your business requirements.

Create file pool

Execute commands on the control node of the storage cluster.

Click to view

cat << EOF | kubectl apply -f -
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: cephfs
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      requireSafeReplicaSize: true
      size: 3
  dataPools:
  - failureDomain: host
    replicated:
      requireSafeReplicaSize: true
      size: 3
  preserveFilesystemOnDelete: false
  metadataServer:
    activeCount: 1
    activeStandby: true
    placement:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - rook-ceph-mds
          topologyKey: kubernetes.io/hostname
      tolerations:
      - effect: NoSchedule
        operator: Exists
    resources:
      requests:
        cpu: "3"
        memory: 8Gi
EOF

Create block pool

Execute commands on the control node of the storage cluster.

Click to view

cat << EOF | kubectl apply -f -
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: block
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
EOF

Create object pool

Execute commands on the control node of the storage cluster.

Click to view

cat << EOF | kubectl apply -f -
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: object
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      requireSafeReplicaSize: true
      size: 3
  dataPool:
    failureDomain: host
    replicated:
      requireSafeReplicaSize: true
      size: 3
  preservePoolsOnDelete: false
  gateway:
    instances: 2
    placement:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - rook-ceph-rgw
          topologyKey: kubernetes.io/hostname
      tolerations:
      - effect: NoSchedule
        operator: Exists
    port: 7480
    resources:
      requests:
        cpu: "2"
        memory: 4Gi
EOF

Follow-up Actions

When other clusters need to utilize the distributed storage service, refer to the following guidelines.
Accessing Storage Services

View full docs as PDF

Node Management

Managed Clusters

Import Clusters

Public Cloud Cluster Initialization

Network Initialization

Storage Initialization

How to

How to

Backup Management

Recovery Management

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application