Vector Search with Cassandra 6

Preview | Unofficial | Experimental | For review only

Cassandra supports native vector search through the VECTOR data type and Storage-Attached Indexing (SAI). You store vector embeddings as regular CQL columns, create an SAI index on those columns, and query using ORDER BY … ANN OF to retrieve approximate nearest neighbors ranked by similarity.

Vector search integrates naturally with the rest of Cassandra’s query model — you can combine ANN similarity ranking with standard CQL scalar filters in a single query.

Mark vector search as experimental in developer guidance. Treat ANN search, recall tuning, and production behavior as subject to change while the feature matures. Validate quality, latency, and operational behavior in your own workload before relying on it for production-critical paths.

The Technology Stack

VECTOR Data Type

The VECTOR data type stores a fixed-length array of float values representing an embedding produced by a machine learning model.

CREATE TABLE documents (
  doc_id      uuid PRIMARY KEY,
  title       text,
  body        text,
  embedding   vector<float, 1536>   -- dimension must match your embedding model
);

The dimension (1536 in this example) must match the output dimension of your embedding model and cannot be changed after the table is created. See Data Types for the full VECTOR type specification.

Example model Typical dimension When to use it

Example model	Typical dimension	When to use it
`text-embedding-3-small`	1536	Good default for text search and RAG when you want a widely used hosted embedding model.
`embed-english-v3.0`	1024	Good fit when you want a smaller hosted vector with strong retrieval quality.
`all-MiniLM-L6-v2`	384	Good fit for local or on-premises deployments where smaller vectors matter.

text-embedding-3-small

1536

Good default for text search and RAG when you want a widely used hosted embedding model.

embed-english-v3.0

1024

Good fit when you want a smaller hosted vector with strong retrieval quality.

all-MiniLM-L6-v2

384

Good fit for local or on-premises deployments where smaller vectors matter.

Pick the embedding model before you create the table. If the model outputs 1536 floats, the schema must use vector<float, 1536>. If the dimensions do not match, writes and ANN queries will fail.

SAI as the Index Engine

SAI (Storage-Attached Indexing) powers vector search in Cassandra. An SAI index on a VECTOR column enables approximate nearest neighbor (ANN) queries using a similarity function.

CREATE INDEX ON documents(embedding) USING 'sai'
  WITH OPTIONS = {'similarity_function': 'cosine'};

Supported similarity functions: cosine (default for most text embeddings), dot_product (for normalized vectors), euclidean (for geometric distances).

SAI is the same indexing system used for scalar column indexes — it is not a vector-specific add-on. Understanding how SAI works will help you tune and troubleshoot both scalar and vector indexes:

SAI Concepts — architecture, how SAI indexes memtables and SSTables, and the filtering engine model
SAI FAQ — common questions on performance, compatibility, and use cases
SAI Read/Write Paths — how reads and writes interact with SAI indexes

ANN Queries

An ANN query returns the N most similar rows to a query vector, ranked by the index similarity function:

SELECT doc_id, title, similarity_cosine(embedding, ?) AS score
FROM documents
ORDER BY embedding ANN OF ?
LIMIT 10;

Note that ANN queries use ORDER BY … ANN OF rather than a WHERE clause. The LIMIT clause is required.

Combining Vector Search with Scalar Filters

You can combine ANN ranking with standard CQL predicates in a single query. SAI handles both the scalar filtering and the ANN ranking:

SELECT doc_id, title
FROM documents
WHERE category = 'technical'
  AND published_year >= 2023
ORDER BY embedding ANN OF ?
LIMIT 10;

For the scalar columns (category, published_year) to filter efficiently, they should have their own SAI indexes. Without indexes, Cassandra will scan all rows that pass the partition filter before applying the ANN ranking.

End-to-End Embedding Flow

Hardcoded vectors such as [0.1, 0.2, 0.3] are useful only for explaining the query shape. In a real application, the vector comes from an embedding model first, then Cassandra stores and queries that returned array.

Example: generate an embedding, store it, then query with another embedding

from openai import OpenAI
from cassandra.cluster import Cluster
import uuid

client = OpenAI()
session = Cluster(["127.0.0.1"]).connect("app")

doc_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Cassandra stores vector search data next to operational data.",
).data[0].embedding

insert_stmt = session.prepare(
    "INSERT INTO documents (doc_id, title, body, embedding) VALUES (?, ?, ?, ?)"
)
session.execute(
    insert_stmt,
    [uuid.uuid4(), "Vector search", "Example document", doc_embedding],
)

query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I use Cassandra for semantic search?",
).data[0].embedding

ann_stmt = session.prepare(
    "SELECT doc_id, title "
    "FROM documents "
    "ORDER BY embedding ANN OF ? "
    "LIMIT 5"
)

for row in session.execute(ann_stmt, [query_embedding]):
    print(row.doc_id, row.title)

Use the same embedding model for ingestion and retrieval. If you change models later, create a new vector column or a new table and re-embed the stored data.

Upstream Vector Search Documentation

The following pages in the upstream Cassandra documentation cover the core vector search capabilities:

Overview — what vector search is and when to use it
Concepts — embeddings, similarity functions, ANN algorithm fundamentals
Data Modeling for Vector Search — schema design patterns for vector workloads
Quickstarts — step-by-step setup with a working dataset
Working with Vector Search — loading data, running queries, and managing indexes

Application Patterns

Retrieval-Augmented Generation (RAG)

RAG is the most common use case for vector search in Cassandra. The pattern:

Split source documents into chunks of meaningful size (paragraphs, sections)
Embed each chunk using a model (OpenAI, Cohere, local model via Ollama, etc.)
Store the chunk text and its embedding vector in Cassandra
At query time, embed the user’s question and retrieve the top-K most similar chunks via ANN
Pass the retrieved chunks as context to a language model to generate a grounded answer

Schema for a RAG application

CREATE TABLE rag_chunks (
  source_id   text,
  chunk_id    uuid,
  chunk_text  text,
  embedding   vector<float, 1536>,
  PRIMARY KEY (source_id, chunk_id)
);

CREATE INDEX ON rag_chunks(embedding) USING 'sai'
  WITH OPTIONS = {'similarity_function': 'cosine'};

Retrieval query

SELECT chunk_text
FROM rag_chunks
ORDER BY embedding ANN OF ?   -- ? = embedded user question
LIMIT 5;

Semantic Search

Semantic search matches results by meaning rather than keyword overlap. The user’s query is embedded at runtime, and the ANN query returns items whose embeddings are geometrically closest to the query embedding.

This enables: "show me products similar to this description," "find articles related to this topic," "return reviews with a similar sentiment."

Schema for semantic product search

CREATE TABLE products (
  product_id    uuid PRIMARY KEY,
  name          text,
  description   text,
  category      text,
  embedding     vector<float, 768>
);

CREATE INDEX ON products(embedding) USING 'sai'
  WITH OPTIONS = {'similarity_function': 'cosine'};
CREATE INDEX ON products(category) USING 'sai';

Semantic search with category filter

SELECT product_id, name
FROM products
WHERE category = ?
ORDER BY embedding ANN OF ?
LIMIT 20;

Personalized Recommendations

User behavior can be encoded as embeddings (through matrix factorization, collaborative filtering, or user interaction sequence models). ANN queries then surface items whose embeddings are close to a user’s current embedding — effectively personalized recommendations without explicit rules.

Schema for item recommendations

CREATE TABLE items (
  item_id     uuid PRIMARY KEY,
  title       text,
  item_type   text,
  embedding   vector<float, 256>
);

CREATE INDEX ON items(embedding) USING 'sai'
  WITH OPTIONS = {'similarity_function': 'dot_product'};
CREATE INDEX ON items(item_type) USING 'sai';

Recommendation query from a user embedding

SELECT item_id, title
FROM items
WHERE item_type = ?
ORDER BY embedding ANN OF ?   -- ? = current user embedding
LIMIT 10;