Operating Cassandra
|
Preview | Unofficial | For review only |
This section is for the people who deploy, configure, secure, monitor, and maintain Apache Cassandra clusters in production. Whether you are standing up your first cluster or upgrading a fleet to Cassandra 6, start here.
Find Your Phase
Use this guide to jump directly to the work that matches where you are in the cluster lifecycle.
- Day 0 — Plan and provision
-
Choose Your Deployment Model | Production Recommendations | Cost and Right-Sizing
- Day 1 — Install and configure
- Day 2 — Operate in production
-
Repair Orchestration | Golden Signals | Backup Strategy | Config as Code | Integrations
- Incident response
-
Collect Artifacts | Diagnose Latency | Diagnose Compaction | AI-Assisted Ops
What’s New for Operators in Cassandra 6
-
Transactional Cluster Metadata (TCM) — Cluster topology and schema changes are now Raft-based and transactional, replacing the gossip-driven model
-
Automated Repair — Cassandra can now schedule and run repairs automatically without external orchestration
-
Guardrails — Configurable guardrails that warn or reject queries and schema changes that violate operational policy
-
Unified Compaction Strategy — A single compaction strategy that adapts to different workloads, replacing the need to choose between STCS/LCS/TWCS
-
Password Validation and Role Name Generation — New security primitives for credential management
-
JDK 21 with Generational ZGC — Recommended runtime for best GC performance
Recommended Reading Path
-
Quickstart — Get Cassandra running in minutes
-
Install and Production Recommendations — Prepare for production
-
Configure — Tune cassandra.yaml, JVM options, and cluster topology
-
Secure — Authentication, authorization, and TLS
-
Operate — Compaction, repair, hints, and bulk loading
-
Observe — Metrics, audit logging, and troubleshooting
-
Upgrade — Migrate to TCM and Accord
Popular Runbooks
-
Upgrade Runbook — rolling upgrade procedure with validation gates and rollback
-
Restore Validation Runbook — step-by-step restore with post-restore checks
-
Node Replacement Runbook — decommission, replace, and validate
-
Disk Pressure Runbook — immediate, short-term, and long-term actions
-
Disaster Recovery Drills — drill scenarios with checklists and review templates
-
Repair Orchestration — auto repair, Reaper, and systems thinking