What Are SSTables?
What Are SSTables?
Section titled “What Are SSTables?”SSTables are immutable, sorted-string table files that persist Cassandra’s in-memory data structures to disk. They pair with memtables and the write-ahead log (WAL) in an LSM-tree design: writes land in memory and log, then flush to disk as SSTables. Immutability enables concurrent readers, simple compaction, and predictable IO.
In this chapter you will learn
Section titled “In this chapter you will learn”- The relationship between LSM-trees, memtables, WAL, and SSTables
- How SSTables participate in Cassandra’s read and write paths
- How formats evolved from big → mc/mm → BTI, and why
- Directory layout and naming conventions, including
TOC.txt
Overview
Section titled “Overview”At a high level, Cassandra batches updates in a memtable and appends them to a WAL for durability. When a memtable fills or a flush is triggered, data is written out as an immutable SSTable on disk. Because SSTables are append-only artifacts with sorted partitions, reads can navigate quickly using auxiliary components (Bloom filter, index, summary) without rewriting data in place.
Role in Cassandra Read/Write Path
Section titled “Role in Cassandra Read/Write Path”- Write path (implementation view):
- Client mutation → append to WAL → update memtable (default:
SkipListMemtable;TrieMemtableis an opt-in alternative — byte-ordered prefix trie with shared prefixes and CPU-core sharding) - Flush triggers
SSTableWriterto build components:- Serialize rows into
Data.db(optionally compressed in fixed-size chunks) - Emit partition digests and offsets to
Index.db; buildSummary.dbsamples - Construct
Filter.db(Bloom) and accumulateStatistics.db - Write
CompressionInfo.db,Digest.crc32, andTOC.txt
- Serialize rows into
- Client mutation → append to WAL → update memtable (default:
- Read path (point read, implementation view):
- Check min/max key bounds of the SSTable; outside range → skip entirely (no Bloom check needed)
- Compute partition key digest and check Bloom (
Filter.db); negative → stop - Use
Summary.dbto narrow a region ofIndex.db; seek to exact index entry - Translate to
Data.dbposition; read aligned to compression chunk boundaries and decompress just the needed bytes
See cross-links: Chapter 4 (flush pipeline) and Chapter 10 (read path decision tree). For a quick visual of component relationships, see the diagram referenced in Chapter 2 (/cqlite/format-guide/diagrams/sstable-components, Mermaid source committed alongside).
Evolution of Formats
Section titled “Evolution of Formats”- big (3.x/4.x): classic multi-file layout; partition index stores digests → data offsets; promoted index used for wide partitions
- mc/mm (4.x): iterative improvements on big; header/version flags and metadata evolve; tooling and defaults shift
- BTI (5.0): B-Tree/Trie Indexed family; improves lookup characteristics and index layout, reducing amplification for certain patterns while preserving the multi-component model
The on-disk component set remains recognizable across versions, but metadata and index structures evolve. Chapter 17 covers BTI in detail.
Directory Layout and Naming
Section titled “Directory Layout and Naming”SSTable file names follow {prefix}-{generation}-{format}-{Component}.db with components enumerated in TOC.txt. Below is a tiny, real TOC.txt from test_basic/simple_table:
Data.dbStatistics.dbDigest.crc32TOC.txtCompressionInfo.dbFilter.dbIndex.dbSummary.dbSidebar: Version Differences (3.x/4.x)
Section titled “Sidebar: Version Differences (3.x/4.x)”- Component set is stable (Data/Index/Summary/Filter/Stats/CompressionInfo/TOC/Digest)
- Naming differs by format tag (
big,mc/mm,bti); reader/writer internals improved in 5.0 - Some 3.x/4.x tools and flags changed defaults; this guide assumes 5.0 behavior unless noted
Key Takeaways
Section titled “Key Takeaways”- SSTables are immutable, sorted disk artifacts produced by memtable flushes
- Reads follow Bloom → Summary → Index → Data (min/max key bounds checked before Bloom for range exclusion); writes never mutate existing SSTables
- The component set is consistent across versions; 5.0 advances internal layout with BTI
TOC.txtis the single source of truth for which component files exist
References
Section titled “References”- Cassandra 5.0.8 (pinned):
SSTableReader— https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/SSTableReader.javaSSTableWriter— https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/SSTableWriter.javaDescriptor— https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/Descriptor.javaMemtableParams(default factory L99) — https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/schema/MemtableParams.java#L99-L100BigTableReader(min/max bounds + Bloom order L220–L278) — https://github.com/apache/cassandra/blob/cassandra-5.0.8/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java#L220-L278
- See also: Chapter 2 (components), Chapter 10 (read flow). For an implementation walkthrough, see Appendix C.