Data Modeling for Cassandra 6
|
Preview | Unofficial | For review only |
Cassandra data modeling differs fundamentally from relational modeling. In a relational database you design tables around entities and relationships, then derive queries from that structure. In Cassandra you design tables around your queries: each query pattern typically needs its own table optimized for that access path.
This is called query-driven design, and it is the central discipline of Cassandra data modeling.
For example, a user profile service often needs two different reads:
-
look up a user by
user_id -
find the same user by
email
In a relational database, that usually starts with one users table and a join-friendly model:
CREATE TABLE users (
user_id uuid PRIMARY KEY,
email text UNIQUE,
full_name text
);
SELECT user_id, email, full_name
FROM users
WHERE email = 'alice@example.com';
In Cassandra, each access pattern needs its own table so the read is a direct partition lookup:
CREATE TABLE users_by_id (
user_id uuid PRIMARY KEY,
email text,
full_name text
);
CREATE TABLE users_by_email (
email text PRIMARY KEY,
user_id uuid,
full_name text
);
SELECT user_id, email, full_name
FROM users_by_email
WHERE email = 'alice@example.com';
The tradeoff is duplication: Cassandra makes the reads fast by storing the same user data in the shape each query needs.
|
What changed for Cassandra 6 upgraders
Cassandra 6 introduces two features that alter data modeling tradeoffs:
|
|
New to Cassandra?
Start with the Cassandra Quickstart to get a running cluster before diving into data modeling. |
Learning Path
Work through these upstream reference pages in order. The sequence moves from conceptual foundations to practical schema design.
1. Understand How Cassandra Differs
If you are coming from a relational or document database background, start here:
-
RDBMS Compared to Cassandra — maps relational concepts to their Cassandra equivalents
-
Introduction to Data Modeling — overview of the Cassandra modeling approach
-
Conceptual Data Modeling — entities, relationships, and the starting point for schema design
2. Design Your Schema
-
Logical Data Modeling — translate conceptual models into Cassandra table structures
-
Query-Driven Design — define your queries first, then derive your tables
-
Physical Data Modeling — partition sizing, clustering, and storage layout
3. Refine and Validate
-
Refining Your Model — iterating on a model to handle edge cases
-
Schema Design — naming conventions, data type choices, and schema governance
4. Tools
-
Data Modeling Tools — software tools for visualizing and validating Cassandra schemas
Cassandra 6 Modeling Patterns
Fewer Tables with Accord Transactions
Before Cassandra 6, maintaining consistency across multiple tables required one of:
-
Duplicating data across multiple denormalized tables and accepting eventual consistency
-
Using lightweight transactions (LWT via Paxos) with significant performance overhead
-
Implementing retry and reconciliation logic in the application
With Accord transactions in Cassandra 6, you can perform atomic multi-partition writes directly in CQL.
|
Tables must be created with |
UPDATE accounts SET balance = balance - 100
WHERE account_id = 'A1'
IF balance >= 100;
UPDATE accounts SET balance = balance + 100
WHERE account_id = 'A2'
IF EXISTS;
Two separate conditional writes. If the second fails, the application must compensate manually.
BEGIN TRANSACTION
LET src = (SELECT balance FROM accounts WHERE account_id = 'A1');
LET dst = (SELECT balance FROM accounts WHERE account_id = 'A2');
IF src.balance >= 100 DO
UPDATE accounts SET balance -= 100 WHERE account_id = 'A1';
UPDATE accounts SET balance += 100 WHERE account_id = 'A2';
END IF
COMMIT TRANSACTION;
Both updates are atomic. If the condition fails, neither write is applied.
For full transaction syntax and limitations, see BEGIN TRANSACTION Reference.
Schema Constraints for Data Integrity
|
|
Before Cassandra 6, column values were unconstrained — only the application tier could reject invalid data. This meant that writes bypassing the application (bulk loaders, direct CQL clients, emergency patches) could produce rows that violated business rules.
In Cassandra 6, CHECK constraints enforce rules at the schema level:
CREATE TABLE products (
product_id uuid PRIMARY KEY,
name text NOT NULL CHECK (LENGTH(name) > 0),
price decimal CHECK (price > 0),
category text CHECK (category IN ('electronics','clothing','food')),
sku text CHECK (REGEXP('^[A-Z]{3}-[0-9]{4}$'))
) WITH transactional_mode = 'full';
Supported constraint types: NOT NULL, numeric comparison operators, IN lists, LENGTH(), REGEXP(), JSON(), and pluggable SPI validators.
See Constraints for the full syntax and pluggable provider interface.
SAI for Query Flexibility Without Denormalization
Storage-Attached Indexing (SAI) reduces the need for denormalized query tables by enabling secondary indexes on non-primary-key columns with high performance. SAI also powers the vector search capability in Cassandra, making it the foundation for AI-driven applications.
For query flexibility with SAI, see the SAI Concepts page in the CQL section. For building AI applications with vector embeddings, see Vector Search.