SSTable Lifecycle and Maintenance
SSTable Lifecycle and Maintenance
Section titled “SSTable Lifecycle and Maintenance”SSTable files are immutable but not static: they are validated, scrubbed, compacted, upgraded, and shipped across nodes. This chapter outlines common lifecycle operations and the invariants that keep multi-file components consistent (with a focus on TOC.txt). It closes with an anti-corruption checklist and a brief linkage to repair and streaming.
In this chapter you will learn
Section titled “In this chapter you will learn”- Common lifecycle operations and when they apply
- How TOC and component invariants are validated
- How to spot orphaned or mismatched components
- Where repair and streaming fit (overview)
Lifecycle Operations
Section titled “Lifecycle Operations”Key offline tools and operations that act on SSTables:
sstablescrub: Scans SSTables to detect and attempt recovery from certain corruptions; rewrites a safe copy when possible (offline).sstablemetadata: PrintsStatistics.dbcontents and derived summaries; useful for verifying timestamp ranges, row counts, compression, and droppable tombstones.sstabledump: Dumps partition/row content for inspection and triage (read-only).- Compaction: Background process that rewrites files to control amplification and reclaim space; see Chapter 15.
Tiny, trimmed example (illustrative) from running metadata on a single SSTable:
$ sstablemetadata nb-1-big-Data.db # trimmed outputSSTable: nb-1-big-Data.dbmin_timestamp: ...max_timestamp: ...total_rows: ...estimated_droppable_tombstones: ...compression: algorithm=LZ4, ratio=...Tip: Prefer verifying a few SSTables per table (newest, oldest, largest) to establish envelope ranges and spot anomalies.
TOC and Component Invariants
Section titled “TOC and Component Invariants”TOC.txt enumerates the components present for a given generation and is authoritative for lifecycle checks. Tools validate it against the directory listing and component headers.
Don’t mix component families in a single generation: BIG components must not be combined with BTI components under the same
{generation}. During upgrades you may have both families in the directory, but each generation is internally consistent.
Core invariants (Cassandra 5.0 multi-file BIG/BTI formats):
- Presence: All components listed in
TOC.txtexist on disk; no unexpected files beyond the set andTOC.txtitself. - Cross-listing: All component files present (except
TOC.txt) are listed inTOC.txt. - Header consistency: Generation and table identity are consistent across
Data.db,Index.db/Partitions.db,Rows.db,Statistics.db,CompressionInfo.db. - Summary/Index alignment:
Summary.dbsamples are sorted and correspond to validIndex.dbpositions. - Compression alignment:
CompressionInfo.dbchunk count/offsets are plausible for theData.dbsize. - Digest/integrity: Optional per-chunk CRCs and
Digest.crc32(when present) validate payloads.
For implementation examples of directory validation, statistics parsing, and compression metadata checks, see Appendix C.
Concurrency During Active Use
Section titled “Concurrency During Active Use”Validation often runs while flush, compaction, or streaming may be writing new generations:
- Treat
TOC.txtas the publication barrier. IfTOC.txtis missing for a set of components, consider that generation in-flight and skip or re-scan later. - Prefer snapshot-based scans (filesystem snapshots or stable directory listings) to avoid racing with file creation/deletion.
- Use read-only handles; avoid file locks. If a component disappears mid-validate, record a transient warning and retry.
- For mmapped components, open after existence checks and avoid long-held descriptors across rescans.
Implementation guidance mirrors this: directory scans should be resilient to transient access failures and record TOC/header inconsistencies as warnings unless clear corruption is detected. See Appendix C for a concrete walkthrough.
Complexity and Performance Notes
Section titled “Complexity and Performance Notes”- Directory scan: O(F) over files in the table directory.
- Per-generation validation: O(C) over components; header checks are O(1) per file.
- TOC reconciliation: O(C) to compare sets; includes string parsing and dedup checks.
- Summary/Index alignment: O(E) over sampled entries (linear) with sortedness checks.
- Compression map plausibility: O(K) over chunk offsets.
Practical guidance:
- Bound work per cycle (e.g., N generations) and backoff when compaction is busy.
- Cache prior results (mtime/size) to skip unchanged components.
- Emit compact reports for CI/ops; reserve deep dump only for failures.
Repair and Streaming Linkage
Section titled “Repair and Streaming Linkage”Repair/streaming move SSTables between nodes and reconcile divergent histories; they depend on the same file/component invariants described above. See Chapter 18 (18-repair-streaming-bootstrap.md) for the process overview and when these occur.
Integration Details
Section titled “Integration Details”- Publication: send complete component sets; receivers validate
TOC.txtand headers before marking SSTables available. - Tombstone policy:
only_purge_repaired_tombstonesdefers purging until repaired, impacting compaction outcomes post-repair. - Level/metadata: level/repaired markers in
Statistics.dbinform LCS placement and post-repair compaction. - Streaming integrity: digests are verified per stream; on failure, SSTables are discarded and retried.
Related Cassandra 5.0.0 code (pinned) for further study:
- Streaming session —
https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/streaming/StreamSession.java - Active repair service —
https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/service/ActiveRepairService.java
Key Takeaways
Section titled “Key Takeaways”TOC.txtis authoritative; validate both presence and cross-listing of components.- Use
sstablemetadataandsstabledumpfor quick health and content checks. - Header, index/summary, and compression metadata must agree across files.
- Purging and compaction improve integrity over time but do not replace validation.
- Repair/streaming rely on the same invariants; broken invariants propagate.
References
Section titled “References”Cassandra 5.0.8 (pinned):
SSTableMetadataViewer(tool) —https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.javaSSTableExport(tool, CLI alias:sstabledump) —https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/tools/SSTableExport.javaDescriptor(component paths/TOC context) —https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/Descriptor.javaSSTableWriter(emitsTOC.txt) —https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/format/SSTableWriter.java
For implementation details, see Appendix C.