No-Heuristics Mandate
Issue #28 codified this as a hard rule after early prototype code guessed column types from value patterns. Guessing produced silent wrong answers on valid Cassandra data.
The rule
Section titled “The rule”Use authoritative metadata only. Never infer types or formats from value content.
- When a schema is present, use it for all type decoding. Schema-aware decoding is not optional.
- When no schema is present, use the serialization metadata embedded in
Statistics.db(encoding stats, column definitions). Do not fall back to type guessing. - Legacy heuristics (blob fallback, type inference from byte patterns) exist only behind the
experimentalfeature flag and are disabled by default.
What this means in practice
Section titled “What this means in practice”Allowed:
// Decode using schema type — correctlet value = schema.column_type("name")?.decode(bytes)?;
// Decode using Statistics.db metadata — correctlet value = stats.column_type("name")?.decode(bytes)?;Not allowed:
// Guess type from value — forbidden without feature flagif bytes.len() == 16 { return Ok(CqlValue::Uuid(...)) }if bytes.starts_with(b"\x00") { return Ok(CqlValue::Int(...)) }Behind the experimental flag only:
#[cfg(feature = "experimental")]fn decode_with_fallback(bytes: &[u8]) -> CqlValue { // Heuristic fallback — acceptable only here}Rationale
Section titled “Rationale”Cassandra’s binary format is untagged. A 4-byte sequence could be an int, a float,
the first four bytes of a text, or part of a blob. Without schema metadata, the
correct interpretation is unknowable. Guessing will produce plausible-but-wrong output
on real production data.
The one test that exercises legacy blob fallback
(test_legacy_format_allows_blob_fallback_with_feature) is explicitly excluded from
the default core-tests gate run. It requires --features experimental and is
intentionally not in CI defaults.
Code quality requirements
Section titled “Code quality requirements”These requirements enforce the mandate at the language level:
- No
unwrap()orexpect()in library code — errors must propagate via? - Use
thiserrorfor error types — structured errors, not string messages RUSTFLAGS="-D warnings"must pass — zero clippy warnings- Memory target:
<128MBfor large files — no unbounded buffering
Schema-aware decoding flow
Section titled “Schema-aware decoding flow”Statistics.db → EncodingStats → column type metadata ↓Schema file (if present) → overrides / supplements metadata ↓Decoder uses column type to interpret raw bytes ↓Typed CqlValue — never raw bytes exposed to callersIf Statistics.db metadata is missing or corrupt for a column, the correct behaviour is to return an error, not to guess. Callers can decide how to handle missing columns; the library must not invent values.
Feature flags and the mandate
Section titled “Feature flags and the mandate”| Feature | Heuristics allowed? |
|---|---|
all-compression (default) | No |
state_machine (default) | No |
cli-helpers | No |
metrics | No |
experimental | Yes — explicitly opt-in |
See Key source paths for where type decoding lives in the codebase. The full feature-flags table and feature-gated test notes are in Gate Contract.