SAI indexing with collections
SAI supports collections of type map, list, and set.
Collections allow you to group and store data together in a column.
In a relational database, a grouping such as a user’s multiple email addresses is achieved via many-to-one joined relationship between (for example) a user table and an email table.
Apache Cassandra avoids joins between two tables by storing the user’s email addresses in a collection column in the user table.
Each collection specifies the data type of the data held.
A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns.
|
In CQL queries of database tables with SAI indexes, the
SAI also supports element-level indexes ( |
Frozen collection indexing
SAI supports element-level indexing on frozen collections (added in Cassandra 6.0, CASSANDRA-18492).
Before Cassandra 6.0, frozen collections could only be indexed with the FULL target, which enables exact equality matching on the entire serialized collection value.
SAI now also accepts VALUES, KEYS, and ENTRIES targets on frozen collections, enabling element-level queries with CONTAINS, CONTAINS KEY, and map entry equality (column[key] = value).
|
This relaxation is SAI-specific.
Non-SAI (legacy 2i) indexes still require |
How SAI indexes frozen collection elements
When an SAI index targets VALUES, KEYS, or ENTRIES on a frozen collection, SAI decomposes the serialized collection into individual terms at write time (both the memtable path and the SSTable path).
Each element is indexed as a separate term.
This is transparent to the query path but has write-amplification implications for large frozen collections: a frozen set with 100 elements produces 100 index terms per write instead of one.
The sai_frozen_term_size guardrail applies to each individually extracted term, not to the full serialized collection.
Supported index targets for frozen collections
| Collection type | Supported index targets | Enabled queries |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
Create indexes on frozen collections
The following examples show CREATE INDEX syntax for frozen collection columns.
All examples use USING 'sai'.
-- Frozen set: VALUES index enables CONTAINS queries
CREATE INDEX ON ks.tbl (VALUES(frozen_tags)) USING 'sai';
-- Frozen list: VALUES index enables CONTAINS queries
CREATE INDEX ON ks.tbl (VALUES(frozen_events)) USING 'sai';
-- Frozen map: KEYS index enables CONTAINS KEY queries
CREATE INDEX ON ks.tbl (KEYS(frozen_metadata)) USING 'sai';
-- Frozen map: VALUES index enables CONTAINS queries on map values
CREATE INDEX ON ks.tbl (VALUES(frozen_metadata)) USING 'sai';
-- Frozen map: ENTRIES index enables map entry equality queries
CREATE INDEX ON ks.tbl (ENTRIES(frozen_metadata)) USING 'sai';
-- FULL indexing continues to work for exact equality on the entire frozen collection
CREATE INDEX ON ks.tbl (FULL(frozen_tags)) USING 'sai';
Query frozen collections with element-level indexes
With element-level SAI indexes defined, you can query frozen collection columns using the same CONTAINS, CONTAINS KEY, and map entry equality syntax used for non-frozen collections.
-- Query a frozen set or list (requires VALUES index)
SELECT * FROM ks.tbl WHERE frozen_tags CONTAINS 'cassandra';
SELECT * FROM ks.tbl WHERE frozen_events CONTAINS 'summit';
-- Query a frozen map by key (requires KEYS index)
SELECT * FROM ks.tbl WHERE frozen_metadata CONTAINS KEY 'env';
-- Query a frozen map by value (requires VALUES index)
SELECT * FROM ks.tbl WHERE frozen_metadata CONTAINS 'production';
-- Query a frozen map by entry (requires ENTRIES index)
SELECT * FROM ks.tbl WHERE frozen_metadata['env'] = 'production';
Using the set type
This example uses the following table and index:
CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
id UUID PRIMARY KEY,
lastname text,
teams set<text>
);
CREATE INDEX teams_idx
ON cycling.cyclist_career_teams (teams)
USING 'sai';
An index is created on the set column teams in the cyclist_career_teams table.
Query with CONTAINS from the set teams column:
- CQL
-
SELECT * FROM cycling.cyclist_career_teams WHERE teams CONTAINS 'Rabobank-Liv Giant'; - Result
-
id | lastname | teams --------------------------------------+----------+------------------------------------------------------------------------------------------------------ 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}
Frozen set indexing
Starting in Cassandra 6.0 (CASSANDRA-18492), SAI supports VALUES indexing on frozen sets.
This enables CONTAINS queries on individual set elements without requiring an exact match on the entire serialized value.
FULL indexing continues to work for exact equality on the complete frozen set.
-- Table with a frozen set column
CREATE TABLE ks.cyclist_career_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams frozen<set<text>>
);
-- Create a VALUES index on the frozen set column (Cassandra 6.0+, SAI only)
CREATE INDEX ON ks.cyclist_career_teams (VALUES(teams)) USING 'sai';
-- Query: find cyclists who were on a specific team
SELECT * FROM ks.cyclist_career_teams WHERE teams CONTAINS 'Team Sky';
|
|
Using the list type
This example uses the following table and index:
CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
year int,
month int,
events list<text>,
PRIMARY KEY (year, month)
);
CREATE INDEX events_idx
ON cycling.upcoming_calendar (events)
USING 'sai';
An index is created on the list column events in the upcoming_calendar table.
Query with CONTAINS from the list events column:
- CQL
-
SELECT * FROM cycling.upcoming_calendar WHERE events CONTAINS 'Criterium du Dauphine'; - Result
-
year | month | events ------+-------+----------------------------------------------- 2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']
A slightly more complex query selects rows that either contain a particular event or have a particular month date:
- CQL
-
SELECT * FROM cycling.upcoming_calendar WHERE events CONTAINS 'Criterium du Dauphine' OR month = 7; - Result
-
year | month | events ------+-------+----------------------------------------------- 2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse'] 2015 | 7 | ['Tour de France']
Frozen list indexing
Starting in Cassandra 6.0 (CASSANDRA-18492), SAI supports VALUES indexing on frozen lists.
This enables CONTAINS queries on individual list elements without requiring an exact match on the entire serialized value.
FULL indexing continues to work for exact equality on the complete frozen list.
-- Table with a frozen list column
CREATE TABLE ks.upcoming_calendar (
id uuid PRIMARY KEY,
month_year text,
events frozen<list<text>>
);
-- Create a VALUES index on the frozen list column (Cassandra 6.0+, SAI only)
CREATE INDEX ON ks.upcoming_calendar (VALUES(events)) USING 'sai';
-- Query: find calendar entries containing a specific event
SELECT * FROM ks.upcoming_calendar WHERE events CONTAINS 'Grand Tour';
|
|
Using the map type
This example uses the following table and indexes:
CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams map<int, text>
);
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );
Indexes created on the map column teams in the cyclist_career_teams table target the keys, values, and full entries of the column data.
Query with KEYS from the map teams column:
- CQL
-
SELECT * FROM cyclist_teams WHERE teams CONTAINS KEY 2014; - Result
-
id | firstname | lastname | teams --------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'} 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2014: 'Rabobank-Liv Woman Cycling Team', 2015: 'Rabobank-Liv Woman Cycling Team'}
Query a value from the map teams column, noting that only the keyword CONTAINS is included:
- CQL
-
SELECT * FROM cyclist_teams WHERE teams CONTAINS 'Team Garmin - Cervelo'; - Result
-
id | firstname | lastname | teams --------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
Query entries from the map teams column, noting the difference in the WHERE clause:
- CQL
-
SELECT * FROM cyclist_teams WHERE teams[2014] = 'Boels:Dolmans Cycling Team' AND teams[2015] = 'Boels:Dolmans Cycling Team'; - Result
-
id | firstname | lastname | teams --------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
This example looks for a row where two entries are present in the map teams column.
Frozen map indexing
Starting in Cassandra 6.0 (CASSANDRA-18492), SAI supports KEYS, VALUES, and ENTRIES indexing on frozen maps.
This enables element-level queries on frozen map columns using the same query syntax available for non-frozen maps.
FULL indexing continues to work for exact equality on the complete frozen map.
-- Table with a frozen map column
CREATE TABLE ks.cyclist_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams frozen<map<int, text>>
);
-- Create element-level indexes on the frozen map column (Cassandra 6.0+, SAI only)
CREATE INDEX ON ks.cyclist_teams (KEYS(teams)) USING 'sai';
CREATE INDEX ON ks.cyclist_teams (VALUES(teams)) USING 'sai';
CREATE INDEX ON ks.cyclist_teams (ENTRIES(teams)) USING 'sai';
-- Query by key: find cyclists associated with a specific year
SELECT * FROM ks.cyclist_teams WHERE teams CONTAINS KEY 2014;
-- Query by value: find cyclists who rode for a specific team
SELECT * FROM ks.cyclist_teams WHERE teams CONTAINS 'Team Garmin';
-- Query by entry: find cyclists who rode for a specific team in a specific year
SELECT * FROM ks.cyclist_teams WHERE teams[2014] = 'Team Garmin';
|
Element-level indexing ( |
|
|
For more information, see: