Vector Search with Cassandra 6
|
Preview | Unofficial | Experimental | For review only |
Cassandra supports native vector search through the VECTOR data type and Storage-Attached Indexing (SAI).
You store vector embeddings as regular CQL columns, create an SAI index on those columns, and query using ORDER BY … ANN OF to retrieve approximate nearest neighbors ranked by similarity.
Vector search integrates naturally with the rest of Cassandra’s query model — you can combine ANN similarity ranking with standard CQL scalar filters in a single query.
|
Mark vector search as experimental in developer guidance. Treat ANN search, recall tuning, and production behavior as subject to change while the feature matures. Validate quality, latency, and operational behavior in your own workload before relying on it for production-critical paths. |
The Technology Stack
VECTOR Data Type
The VECTOR data type stores a fixed-length array of float values representing an embedding produced by a machine learning model.
CREATE TABLE documents (
doc_id uuid PRIMARY KEY,
title text,
body text,
embedding vector<float, 1536> -- dimension must match your embedding model
);
The dimension (1536 in this example) must match the output dimension of your embedding model and cannot be changed after the table is created.
See Data Types for the full VECTOR type specification.
| Example model | Typical dimension | When to use it |
|---|---|---|
|
1536 |
Good default for text search and RAG when you want a widely used hosted embedding model. |
|
1024 |
Good fit when you want a smaller hosted vector with strong retrieval quality. |
|
384 |
Good fit for local or on-premises deployments where smaller vectors matter. |
|
Pick the embedding model before you create the table.
If the model outputs 1536 floats, the schema must use |
SAI as the Index Engine
SAI (Storage-Attached Indexing) powers vector search in Cassandra.
An SAI index on a VECTOR column enables approximate nearest neighbor (ANN) queries using a similarity function.
CREATE INDEX ON documents(embedding) USING 'sai'
WITH OPTIONS = {'similarity_function': 'cosine'};
Supported similarity functions: cosine (default for most text embeddings), dot_product (for normalized vectors), euclidean (for geometric distances).
SAI is the same indexing system used for scalar column indexes — it is not a vector-specific add-on. Understanding how SAI works will help you tune and troubleshoot both scalar and vector indexes:
-
SAI Concepts — architecture, how SAI indexes memtables and SSTables, and the filtering engine model
-
SAI FAQ — common questions on performance, compatibility, and use cases
-
SAI Read/Write Paths — how reads and writes interact with SAI indexes
ANN Queries
An ANN query returns the N most similar rows to a query vector, ranked by the index similarity function:
SELECT doc_id, title, similarity_cosine(embedding, ?) AS score
FROM documents
ORDER BY embedding ANN OF ?
LIMIT 10;
Note that ANN queries use ORDER BY … ANN OF rather than a WHERE clause.
The LIMIT clause is required.
Combining Vector Search with Scalar Filters
You can combine ANN ranking with standard CQL predicates in a single query. SAI handles both the scalar filtering and the ANN ranking:
SELECT doc_id, title
FROM documents
WHERE category = 'technical'
AND published_year >= 2023
ORDER BY embedding ANN OF ?
LIMIT 10;
For the scalar columns (category, published_year) to filter efficiently, they should have their own SAI indexes.
Without indexes, Cassandra will scan all rows that pass the partition filter before applying the ANN ranking.
End-to-End Embedding Flow
Hardcoded vectors such as [0.1, 0.2, 0.3] are useful only for explaining the query shape.
In a real application, the vector comes from an embedding model first, then Cassandra stores and queries that returned array.
from openai import OpenAI
from cassandra.cluster import Cluster
import uuid
client = OpenAI()
session = Cluster(["127.0.0.1"]).connect("app")
doc_embedding = client.embeddings.create(
model="text-embedding-3-small",
input="Cassandra stores vector search data next to operational data.",
).data[0].embedding
insert_stmt = session.prepare(
"INSERT INTO documents (doc_id, title, body, embedding) VALUES (?, ?, ?, ?)"
)
session.execute(
insert_stmt,
[uuid.uuid4(), "Vector search", "Example document", doc_embedding],
)
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input="How do I use Cassandra for semantic search?",
).data[0].embedding
ann_stmt = session.prepare(
"SELECT doc_id, title "
"FROM documents "
"ORDER BY embedding ANN OF ? "
"LIMIT 5"
)
for row in session.execute(ann_stmt, [query_embedding]):
print(row.doc_id, row.title)
Use the same embedding model for ingestion and retrieval. If you change models later, create a new vector column or a new table and re-embed the stored data.
Upstream Vector Search Documentation
The following pages in the upstream Cassandra documentation cover the core vector search capabilities:
-
Overview — what vector search is and when to use it
-
Concepts — embeddings, similarity functions, ANN algorithm fundamentals
-
Data Modeling for Vector Search — schema design patterns for vector workloads
-
Quickstarts — step-by-step setup with a working dataset
-
Working with Vector Search — loading data, running queries, and managing indexes
Application Patterns
Retrieval-Augmented Generation (RAG)
RAG is the most common use case for vector search in Cassandra. The pattern:
-
Split source documents into chunks of meaningful size (paragraphs, sections)
-
Embed each chunk using a model (OpenAI, Cohere, local model via Ollama, etc.)
-
Store the chunk text and its embedding vector in Cassandra
-
At query time, embed the user’s question and retrieve the top-K most similar chunks via ANN
-
Pass the retrieved chunks as context to a language model to generate a grounded answer
CREATE TABLE rag_chunks (
source_id text,
chunk_id uuid,
chunk_text text,
embedding vector<float, 1536>,
PRIMARY KEY (source_id, chunk_id)
);
CREATE INDEX ON rag_chunks(embedding) USING 'sai'
WITH OPTIONS = {'similarity_function': 'cosine'};
SELECT chunk_text
FROM rag_chunks
ORDER BY embedding ANN OF ? -- ? = embedded user question
LIMIT 5;
Semantic Search
Semantic search matches results by meaning rather than keyword overlap. The user’s query is embedded at runtime, and the ANN query returns items whose embeddings are geometrically closest to the query embedding.
This enables: "show me products similar to this description," "find articles related to this topic," "return reviews with a similar sentiment."
CREATE TABLE products (
product_id uuid PRIMARY KEY,
name text,
description text,
category text,
embedding vector<float, 768>
);
CREATE INDEX ON products(embedding) USING 'sai'
WITH OPTIONS = {'similarity_function': 'cosine'};
CREATE INDEX ON products(category) USING 'sai';
SELECT product_id, name
FROM products
WHERE category = ?
ORDER BY embedding ANN OF ?
LIMIT 20;
Personalized Recommendations
User behavior can be encoded as embeddings (through matrix factorization, collaborative filtering, or user interaction sequence models). ANN queries then surface items whose embeddings are close to a user’s current embedding — effectively personalized recommendations without explicit rules.
CREATE TABLE items (
item_id uuid PRIMARY KEY,
title text,
item_type text,
embedding vector<float, 256>
);
CREATE INDEX ON items(embedding) USING 'sai'
WITH OPTIONS = {'similarity_function': 'dot_product'};
CREATE INDEX ON items(item_type) USING 'sai';
SELECT item_id, title
FROM items
WHERE item_type = ?
ORDER BY embedding ANN OF ? -- ? = current user embedding
LIMIT 10;
Related Topics
-
Data Types — VECTOR type specification
-
SAI Concepts — index architecture
-
SAI Collections — indexing list, set, and map columns alongside VECTOR
-
SAI Monitoring — virtual tables for index health and performance
-
Data Modeling — query-driven schema design principles