Production Checklist
MicroRaft is a library, so production readiness depends on the system you build around it. Use this as a rollout checklist, not as marketing copy.
Persistence
- Decide how Raft log entries, snapshots, and metadata will be stored durably.
- Verify crash recovery behavior with realistic restart tests.
- Ensure your snapshot policy matches state size and recovery time targets.
Transport and threading
- Define backpressure and queueing boundaries between the Raft layer and your networking stack.
- Validate timeout behavior under partial node stalls, not only clean failures.
- Load-test your chosen threading model with realistic request bursts.
Monitoring and diagnostics
- Export
RaftNodeReportdata or usemicroraft-metricswith Micrometer. - Alert on leader changes, quorum loss, replication lag, and snapshot churn.
- Keep enough logs to reconstruct membership and leadership transitions.
Failure scenarios
- Test node restart, node replacement, and network partition scenarios.
- Exercise quorum-loss behavior and leader demotion handling.
- Rehearse membership changes instead of treating them as a theoretical feature.
Release and upgrade process
- Document the upgrade order for your service.
- Verify compatibility assumptions across persisted state and snapshots.
- Keep benchmark and soak-test results for the exact build you deploy.
Before going live
- Run the tutorial flow and your own state machine integration tests.
- Validate observability wiring with Monitoring.
- Review the relevant use-case recipes.