Application Patterns

Common patterns for building applications with Apache Cassandra.

Data Modeling Patterns

Query-Driven Table Design

Cassandra tables are designed around your query patterns, not entity relationships. Each query typically maps to one table, with the partition key chosen to group related data for efficient retrieval.

For example, a user profile service usually needs one table for lookups by user_id and another for lookups by email. That duplication is deliberate: Cassandra trades normalized joins for direct reads on the shape each query needs.

See Data Modeling for comprehensive guidance, including:

Time-Series Data

For time-series workloads (logs, metrics, events):

Use a compound partition key with a time bucket (e.g., (sensor_id, date))
Use clustering columns for time ordering within each partition
Set TTL on inserts for automatic data expiration
Use Time Window Compaction (TWCS) for efficient cleanup of expired data

Denormalization

Cassandra prefers denormalized data models over joins. If the same data is needed by multiple queries, create multiple tables with different partition keys. With Cassandra 6’s Accord transactions, you can update multiple denormalized tables atomically.

Example: keep two user lookup tables in sync

from cassandra.cluster import Cluster

session = Cluster(["127.0.0.1"]).connect("app")

def upsert_user(user_id, email, full_name):
    session.execute("""
        BEGIN TRANSACTION
          UPDATE users_by_id
          SET email = ?, full_name = ?
          WHERE user_id = ?;

          UPDATE users_by_email
          SET user_id = ?, full_name = ?
          WHERE email = ?;
        COMMIT TRANSACTION
    """, [email, full_name, user_id, user_id, full_name, email])

The application writes both tables in one transaction, so a caller can read by user_id or by email without seeing a half-updated state. Before Cassandra 6, you would need application-side compensation or a different consistency strategy.

Consistency Patterns

Consistency Level Selection

Choose consistency levels based on your application’s requirements:

LOCAL_ONE / ONE — lowest latency, eventual consistency
LOCAL_QUORUM / QUORUM — strong consistency for most applications
ALL — highest consistency, lowest availability

See DML for details on consistency level behavior.

Lightweight Transactions (Pre-Cassandra 6)

For conditional updates (compare-and-set):

INSERT INTO users (user_id, email) VALUES (?, ?)
  IF NOT EXISTS;

UPDATE users SET email = ? WHERE user_id = ?
  IF email = 'old@example.com';

Accord Transactions (Cassandra 6)

For multi-partition ACID transactions:

BEGIN TRANSACTION
  UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
  UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';
COMMIT TRANSACTION;

See BEGIN TRANSACTION Reference for complete syntax.

Vector Search Patterns

Cassandra 6 supports vector embeddings for AI/ML applications.

Retrieval-Augmented Generation (RAG)

Store document chunks with their vector embeddings and retrieve relevant context for LLM prompts:

CREATE TABLE documents (
  doc_id uuid,
  chunk_id int,
  content text,
  embedding vector<float, 1536>,
  PRIMARY KEY (doc_id, chunk_id)
);

CREATE CUSTOM INDEX ON documents(embedding) USING 'StorageAttachedIndex';

Semantic Search

Find items by meaning rather than exact keyword match using ANN (approximate nearest neighbor) queries.

Personalized Recommendations

Store user preference vectors and find similar items or users through vector similarity.

See Vector Search for detailed examples of all three patterns.

Third-Party Integrations

Cassandra Lucene Index (Retired)

Retired as of Cassandra 5.0. Use SAI for secondary indexing and full-text search capabilities.