Skip to content

Export to Parquet

Task: Write query results to a Parquet file for downstream analytics.

Terminal window
cqlite \
--schema test-data/schemas/basic-types.cql \
--data-dir test-data/datasets/sstables \
--query "SELECT id, name, age FROM test_basic.simple_table LIMIT 3" \
--out parquet \
--output /tmp/simple_table.parquet \
--overwrite

Exit code: 0 on success. File is created at the path given by --output.

Expected: simple_table.parquet created; file size varies by row count (around 1.3 KB for 3 rows).

FlagPurpose
--out parquetSelect Parquet output format
--output <path>Destination file path (required for Parquet)
--overwriteReplace existing file; omit to get exit code 6 on collision
Terminal window
cqlite \
--schema test-data/schemas/basic-types.cql \
--data-dir test-data/datasets/sstables \
--query "SELECT * FROM test_basic.simple_table" \
--out parquet \
--output /tmp/all_rows.parquet \
--overwrite
Terminal window
cqlite \
--schema test-data/schemas/time-series.cql \
--data-dir test-data/datasets/sstables \
--query "SELECT sensor_id, timestamp, temperature, humidity FROM test_timeseries.sensor_data LIMIT 1000" \
--out parquet \
--output /tmp/sensor_data.parquet \
--overwrite

When the query runs against a schema (the normal case), columns export with high-fidelity Arrow/Parquet types, recursively for nested types (epic #673):

CQL typeArrow/Parquet type
text, varchar, asciiUtf8 (STRING)
int, smallint, tinyintInt32 / Int16 / Int8
bigint, counterInt64
float / doubleFloat32 / Float64
booleanBoolean
uuid, timeuuidFixedSizeBinary(16) with the arrow.uuid extension annotation
timestampTimestamp(Millisecond, UTC)
dateDate32
timeTime64(Nanosecond)
decimalDecimal128(38, 9) (checked rescale; overflow is an error)
varintDecimal128(38, 0)
durationUtf8 (CQL text form; Parquet cannot encode Interval(MonthDayNano))
inetUtf8 (canonical text)
blobBinary (BYTE_ARRAY)
list<T>, set<T>List<T> with typed elements
map<K,V>Map<K, V> with typed keys and values
tuple<...>Struct with positional field_N children
UDTStruct with the UDT’s field names
frozen<T>Same as T (transparent)

See Output Formats for the full type map and precision notes.

The Python bindings expose the same writer directly (Epic #682) — no subprocess needed:

import cqlite
with cqlite.open('test-data/datasets/sstables',
schema='test-data/schemas/basic-types.cql') as db:
rows = db.export_parquet(
'SELECT * FROM test_basic.simple_table',
'/tmp/simple_table.parquet',
row_group_size=10000, # rows per Parquet row group
compression='snappy', # 'snappy' (default), 'zstd', or 'none'
)
print(f'Exported {rows} rows')

The query streams, so large tables export within bounded memory, and the GIL is released for the duration of the export.

const { Database } = require('@cqlite/node');
const db = await Database.open('test-data/datasets/sstables', {
schema: 'test-data/schemas/basic-types.cql',
});
const rows = await db.exportParquet(
'SELECT * FROM test_basic.simple_table',
'/tmp/simple_table.parquet',
{ rowGroupSize: 10000, compression: 'snappy' }
);
console.log(`Exported ${rows} rows`);
await db.close();

The export runs as an async task off the JavaScript main thread.

The writer lives in cqlite-core behind the off-by-default parquet cargo feature:

[dependencies]
cqlite-core = { version = "*", features = ["parquet"] }
use cqlite_core::export::parquet::{ParquetExportOptions, StreamingParquetWriter};
let mut iter = db.execute_streaming(query, Default::default()).await?;
let file = std::fs::File::create("/tmp/out.parquet")?;
let mut writer = StreamingParquetWriter::new(file, &iter.metadata, &ParquetExportOptions::default())?;
while let Some(row) = iter.next_async().await {
writer.write_chunk(&[row?])?;
}
writer.finalize()?;

CQLite produces Parquet files only; committing them to Iceberg/Delta table formats is an external committer’s job.

import pyarrow.parquet as pq
table = pq.read_table('/tmp/simple_table.parquet')
print(table.to_pandas())
SymptomErrorFix
--output not provided with --out parquetError: --output is required for Parquet formatAdd --output /path/to/file.parquet
File existsexit code 6Add --overwrite
No rows matchedEmpty Parquet file (0 row groups)Check WHERE clause and schema