RabbitMQ Mnesia Database Exception Handling

Common Mnesia Database Failures

RabbitMQ uses the Mnesia database to store information such as queues, exchanges, and bindings. The common causes of Mnesia failures can be categorized into two types:

Permission issues with the MNESIA_BASE directory, where the user lacks sufficient write permissions to that directory.
Mnesia fails to read the table.

Permission Issues

When you encounter permission issues with the Mnesia database, simply configure the write permissions for the current user on the MNESIA_BASE directory to resolve the problem.

Mnesia Table Read Failure

Mnesia creates the corresponding database schema based on the machine's hostname. Therefore, when the hostname changes, Mnesia cannot load the old schema. Similarly, if the rabbit@hostname directory is renamed, Mnesia will also be unable to locate the old database files, prompting it to create a new rabbit@hostname folder and start the database anew.

To fix Mnesia read failures, you can follow these steps:

First, check if there have been any updates to the node's hostname or the rabbit@hostname directory name. If there have been updates, you can rename it following the convention of “rabbit@hostname,” which will allow you to see the old database files after renaming.
If no updates are found, in cluster mode, the startup may fail due to unsuccessful connections among replicas. You can investigate communication issues between replicas based on the logs.
If the above still does not restore data, you can choose to back up the current cluster's configuration, redeploy new nodes to join the original cluster, and import the backed-up configuration.

If RabbitMQ fails to form a new cluster properly after a restart, you may encounter errors in the log like the following:

[warning] <0.273.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@e2e-rabbitmq-server-2.e2e-rabbitmq-nodes.local-midautons','rabbit@e2e-rabbitmq-server-1.e2e-rabbitmq-nodes.local-midautons','rabbit@e2e-rabbitmq-server-0.e2e-rabbitmq-nodes.local-midautons'],[rabbit_durable_queue]}
 [info] <0.273.0> Waiting for Mnesia tables for 30000 ms, 7 retries left

This issue occurs because information about the cluster nodes is stored in the Mnesia database, and the state information is not properly cleared during RabbitMQ's restart. Due to RabbitMQ's sequential startup policy for Pods, this may result in the initially started Pods waiting indefinitely for other Pods to complete their startup, creating a deadlock. The recommended solution is to forcefully delete the cluster state and restart it. You can execute the following command in kubectl:

kubectl -n {namespace name} exec {instance name} -- rabbitmqctl force_boot

Node Management

Managed Clusters

Import Clusters

Public Cloud Cluster Initialization

Network Initialization

Storage Initialization

How to

How to

Backup Management

Recovery Management

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application