Upgrade Runbook
|
Preview | Unofficial | For review only |
This runbook covers the general rolling upgrade procedure for Apache Cassandra 6.0. Cassandra 6.0 introduces architectural changes — most notably Transactional Cluster Metadata (TCM) and a hard JDK 21 requirement — that make this upgrade materially different from prior minor-version upgrades.
Read this runbook in full before you begin. Then follow each section in order without skipping steps.
For the TCM-specific initialization sequence that follows a successful cluster upgrade, see the TCM Pre-Upgrade Prerequisites and TCM Upgrade Procedure pages. The steps on those pages are not repeated here.
Pre-Upgrade Checklist
Complete every item before upgrading any node. This checklist is a gate, not a suggestion.
-
Supported upgrade path verified. Confirm every node is running Cassandra 4.0, 4.1, or 5.0. Direct upgrades from 3.x to 6.0 are not supported. Upgrade to 4.0 first if any node is on 3.x.
-
All nodes are up and in
UNstate. Runnodetool statusacross all nodes. Every non-decommissioned node must showUN(Up/Normal). Do not proceed with any node in a non-UNstate. -
No pending schema migrations. Run
nodetool describeclusterand confirmSchema versions:shows a single schema version hash across all nodes. Mixed schema versions indicate an incomplete prior operation; resolve it before upgrading. -
Repairs are not in progress. Confirm via
nodetool compactionstatsandnodetool tpstatsthat no repair sessions are running. Cancel any in-progress repairs and allow compactions to stabilize. -
JDK 21 is installed on every node. Cassandra 6.0 requires JDK 21. Earlier JDK versions will not start the node. Verify with
java -versionon each host before upgrading any Cassandra binary. -
Heap and GC configuration is reviewed. The move from G1GC to ZGC as the recommended garbage collector requires updating JVM options. Review
jvm-server.optionsandjvm11-server.options; update tojvm21-server.optionssettings provided in the 6.0 distribution. -
cassandra.yaml reviewed for removed or renamed options. Several configuration keys changed in 6.0. Run a diff between your existing
cassandra.yamland the bundled template. Pay particular attention to guardrails settings, which are now first-class configuration rather than commented-out stubs. -
Backup taken and verified. Take a snapshot of every keyspace with
nodetool snapshoton every node. Confirm snapshots are present in the data directory or copied off-host. Cassandra 6.0 uses SSTable format versionoaby default, which is not readable by prior versions. A pre-upgrade snapshot is your only reliable rollback path. -
Rollback plan documented and tested. Confirm the team has read the Rollback Procedure section below and that the snapshot location is known to everyone involved in the upgrade.
Rolling Upgrade Procedure
Upgrade one node at a time, completing all steps for each node before moving to the next. Do not upgrade more than one node simultaneously.
Repeat these eight steps for every node in the cluster. Start with a non-seed node in the first datacenter.
Per-Node Steps
-
Drain the node.
nodetool drainDrain flushes all memtables to disk and stops the node from accepting new writes. Wait for the command to return before proceeding.
-
Stop the Cassandra service.
sudo systemctl stop cassandra # or sudo service cassandra stop -
Replace the Cassandra binaries.
Install the 6.0 package using your distribution method (tarball, deb, rpm). Do not overwrite
cassandra.yaml,jvm*.options, orlogback.xmlautomatically — merge them by hand or with a configuration management tool.If installing from a tarball, set
CASSANDRA_HOMEto the new directory and update any symlinks before proceeding. -
Update JVM options.
Replace or merge your existing JVM options files to use the 6.0 defaults. At minimum, ensure
jvm21-server.optionsis present and active. Remove any explicit-XX:+UseG1GCflags; 6.0 defaults to ZGC. -
Verify
cassandra.yamlcompatibility.Confirm no removed keys are present. Pay attention to:
-
commitlog_syncandcommitlog_sync_period_in_ms(renamed in 6.0) -
num_tokens(must match existing ring state; do not change during upgrade) -
guardrails.*keys (new namespace; review defaults against your cluster’s workload profile)
-
-
Start the node.
sudo systemctl start cassandra -
Wait for the node to rejoin the ring.
Monitor
system.logforJOINING→NORMALstate transition. The node is ready whennodetool statusshowsUNfor it.nodetool statusIf the node does not return to
UN, stop the rollout and investigate the node before moving to the next host. Checksystem.logfor streaming or join failures and confirm the service is still healthy. -
Run the validation gate before moving to the next node.
See Validation Gates below. Do not upgrade the next node until all gates pass.
Validation Gates
Run these checks after each node upgrade and before touching the next node. A failed gate means you stop, diagnose, and resolve before continuing.
Gate 1: Ring State
nodetool status
Expected: the newly upgraded node shows UN.
All other nodes must also show UN.
Any DN (Down/Normal) or UL (Up/Leaving) state is a blocker.
Gate 2: Schema Agreement
nodetool describecluster
Expected: Schema versions: lists exactly one version hash.
If two hashes appear, the upgraded node has diverged from the rest of the cluster.
Do not proceed.
Gate 3: Gossip Health
nodetool gossipinfo
Confirm the upgraded node appears in the gossip table with the correct datacenter,
rack, and STATUS:NORMAL.
Verify that the upgraded node can see all other nodes and vice versa.
Rollback Procedure
|
Cassandra 6.0 writes SSTables in format version |
Use the rollback procedure only when a node cannot be stabilized on 6.0 and must be returned to its prior version.
-
Stop the node.
sudo systemctl stop cassandra -
Remove or archive the 6.0 binaries.
-
Restore the prior Cassandra version binaries.
Reinstall the previous version package. Restore your backed-up
cassandra.yaml,jvm*.options, andlogback.xml. -
Restore data from pre-upgrade snapshot.
If the node wrote any data after being upgraded to 6.0, you must restore its data directory from the pre-upgrade snapshot taken in the checklist. Copy snapshot files back into each keyspace data directory and run
nodetool refreshagainst each affected keyspace and table. -
Start the node on the prior version.
sudo systemctl start cassandra -
Verify the node rejoins the ring and passes all validation gates.
-
Assess the rollback scope.
If any nodes successfully upgraded to 6.0 wrote
oa-format SSTables, those nodes must also be rolled back from snapshot. Coordinate the rollback across all affected nodes before resuming reads and writes. -
Raise a cluster-wide incident review.
Diagnose the root cause before re-attempting the upgrade. Do not re-attempt the upgrade while unresolved issues from the previous attempt remain open.
Post-Upgrade Verification
After every node in the cluster is running 6.0 and all validation gates have passed, run the following verification steps before declaring the upgrade complete.
Upgrade SSTables
Cassandra 6.0 can read SSTables written by prior versions, but those files will not
benefit from the new format improvements until they are rewritten.
Schedule upgradesstables during a maintenance window after the full cluster is on 6.0.
If you skip this step, the cluster continues to carry legacy SSTables and the old disk,
compaction, and table layout costs remain until normal compaction rewrites them.
nodetool upgradesstables
Run this on every node.
Monitor compaction progress with nodetool compactionstats.
If upgradesstables stalls, investigate the compaction queue and logs before retrying.
Confirm Metrics Pipeline
Verify that your existing metrics collection (JMX, Prometheus via cassandra-exporter,
Datadog, or other) is receiving data from all upgraded nodes.
Key metrics to verify:
-
org.apache.cassandra.metrics.ClientRequest.Read.Latency -
org.apache.cassandra.metrics.ClientRequest.Write.Latency -
org.apache.cassandra.metrics.Storage.Load -
org.apache.cassandra.metrics.ThreadPools.*.Dropped
Cassandra 6.0 introduces additional guardrails-related metrics under
org.apache.cassandra.metrics.Guardrails.*.
Confirm these are visible and that thresholds are appropriate for your workload.
Run Full Repair
Run a full repair after upgradesstables completes.
This ensures all replicas are consistent on the new SSTable format.
Repair does not replace the SSTable rewrite step; it validates replica agreement after the rewrite.
nodetool repair -full
Run repair on each node sequentially to avoid saturating the cluster.
Use nodetool repairconfig (6.0) to tune parallelism if needed.
Cassandra 6.0-Specific Considerations
Transactional Cluster Metadata (TCM)
TCM is the most significant operational change in 6.0. After a full cluster upgrade is complete, TCM must be explicitly initialized — it does not activate automatically. The upgrade runbook above gets you to a fully upgraded cluster. The TCM initialization sequence is a separate operation described in:
Do not initialize TCM until every node in every datacenter is running 6.0 and all post-upgrade verification steps above have completed.
JDK 21 Requirement
JDK 21 is mandatory.
The jvm21-server.options file ships with the distribution.
Review heap sizing carefully: ZGC performs best with larger heap allocations and
is less sensitive to heap fragmentation than G1GC.
A starting point for heap sizing under ZGC is the same maximum heap you used
under G1GC; monitor GC pause times and adjust.
Avoid setting -Xmx above 50% of available system RAM to preserve off-heap
memory for OS page cache, which Cassandra relies on heavily for read performance.
Guardrails
Guardrails in 6.0 are enabled by default with conservative production-appropriate
thresholds.
Review cassandra.yaml section guardrails: and the
Guardrails Reference before going live.
Guardrails can reject writes that exceed configured thresholds; confirm that
your workload’s partition sizes, column counts, and collection sizes fall within
defaults or adjust thresholds deliberately.
Key guardrails to review before production traffic:
| Guardrail | Default in 6.0 |
|---|---|
|
100 MiB |
|
1 GiB |
|
50 |
|
3 |
|
150 |
Accord Consensus Protocol
Cassandra 6.0 ships with Accord, a new transactional consensus protocol, available as an opt-in replacement for LWT (Lightweight Transactions). Accord is disabled by default. Do not enable it during the upgrade window. Evaluate Accord in a non-production environment before enabling on live workloads. See Onboarding to Accord for the enablement guide.