A Personal Journey into Database Internals
I should start with a confession: I am not a math expert, nor am I a core PostgreSQL developer. I am a curious engineer who loves to experiment. Spiral is the result of letting my creative side take the wheel—a research project born from a simple desire to understand how databases work at the bit-level and to see if I could implement some of my own “crazy” ideas.
Everything you see here is a prototype. It is an exploration of what becomes possible when we use Rust to extend the heart of the database. I’m sharing this as a research outcome, hoping to find other curious minds who might want to contribute or challenge these concepts.
Try It in Five Minutes
Spiral targets PostgreSQL 16–18. Install via Homebrew on macOS:
brew install postgresql@18
brew install --build-from-source ./Formula/spiral.rb
Or build from source with cargo-pgrx:
cargo install cargo-pgrx
cargo pgrx init
cargo pgrx run pg18 # drops into a psql session with spiral loaded
Add to postgresql.conf so the planner hook and background worker start at boot:
shared_preload_libraries = 'spiral'
Then try the short walkthrough:
cargo pgrx run pg18 < examples/short_walkthrough.sql
Or paste this into any psql session:
CREATE EXTENSION spiral;
SET spiral.kickoff_date = '2026-01-01';
CREATE TABLE ticks (
t timestamptz NOT NULL,
symbol_id int NOT NULL,
price double precision, -- Spiral: ohlcv
vol int -- Spiral: sum
) WITH (spiral.frames = '1m,1h', spiral.tenant = 'symbol_id');
-- Insert some data — background worker auto-refreshes rollups
INSERT INTO ticks SELECT
now() - (random() * interval '2 hours'),
(random() * 10)::int,
100 + random() * 50,
(random() * 1000)::int
FROM generate_series(1, 100000);
-- Manually refresh (or just wait for the bgworker)
SELECT spiral_refresh('ticks');
-- Query the 1-minute rollup
SELECT t, symbol_id, price_ohlcv_h, price_ohlcv_l, vol
FROM ticks_1m ORDER BY t DESC LIMIT 10;
The source is at github.com/jonatas/spiral.
The “Spiral” Concept: A Metaphor for Data Flow
The central idea behind this project is to stop thinking of data as a flat, linear history. As datasets grow to billions of rows, the traditional model faces what I call Data Gravity—where the weight of the data makes every operation exponentially harder.
In my experiments, I started imagining storage as a Spiral.
In this model, fresh data arrives at high velocity in the center. As it “cools” and ages, it migrates to outer orbits where it’s aggregated into stable, dense structures.
Multi-Tenancy: The Lane System
In a multi-tenant environment, the “Data Gravity” problem is multiplied. If we store everyone’s data in the same linear file, noisy neighbors can drown out everyone else.
Spiral solves this by assigning each tenant their own Lane within the spiral. Imagine a cosmic highway where each tenant has their own orbital path. This preserves locality not just in time, but in tenant identity. You can explore this concept in depth at the Spiral Interactive Lab.
The Geometry of Packing: Massively Multi-Tenant Scale
To understand how Spiral achieves constant-time seeks across billions of rows, we have to look at the Packing Geometry. By using an Archimedean spiral, we can map a 1D sequence (time) into a 2D plane where distance from the center represents “age” and the rotation phase represents “tenant identity.”
In this advanced simulator, you can push the system to 100 concurrent tenants. Watch how the Temporal Resolution (Step) and Packing Density configs mirror the internal bundle_size and frame_seconds parameters.
The Engineering Problem: Entropy and Page Overhead
PostgreSQL organizes data into fixed-size 8KB Pages. For analytical queries on massive time-series data, this creates a significant “IO Tax.”
(8192 − 24 − 24) / 8 = 1018 f64 values"] --> OP OP["SpiralPageOpaque · 24 bytes
window_start_t · window_end_t
tenant_scale · magic 0x50495241"] end subgraph "Standard Heap 8KB Page" PH["PageHeaderData · 24 bytes"] --> LP LP["Line Pointers array"] --> T1 LP --> T2 T1["Tuple 1 (header + columns)"] --> FS T2["Tuple 2"] --> FS FS["Free Space (variable)"] end
The B-Tree Trap in Multidimensional Queries
B-Trees are the workhorse of Postgres, but they are fundamentally one-dimensional. When you create a composite index on (tenant_id, time), you are prioritizing one dimension over the other.
- Query A (
WHERE tenant_id = 1 AND t > ...): Fast. - Query B (
WHERE t > ...across multiple tenants): Slow.
How Spiral Differs from Existing Solutions
Several mature tools address time-series at scale. Spiral explores a different set of trade-offs.
| TimescaleDB | pg_partman | Spiral | |
|---|---|---|---|
| Storage | Hypertable chunks | Native partitions | Custom TAM + standard heap |
| Rollups | Continuous aggregates (explicit) | None | Auto-derived from magic comments |
| Query rewrite | Manual view queries | None | Transparent planner hook |
| Dirty-aware slicing | No | No | Yes — clean → rollup, dirty → raw |
| Mergeable stats | Limited | None | Welford moments + sketch/tdigest |
| Multi-tenant isolation | Schema-per-tenant or manual | Manual | Built-in spiral.tenant lane |
| Background maintenance | Yes (jobs) | Yes (cron) | Yes (autonomous bgworker, 1s tick) |
| Extension type | C + custom storage | Pure SQL | Rust (pgrx) + TAM + planner hook |
The key architectural difference: you never change your query. TimescaleDB continuous aggregates are separate views that you must remember to query. Spiral intercepts the original query and routes it transparently — the application code never changes when a new rollup tier is added or a bucket becomes dirty.
Geometry-Aware Indexing with Z-Order
B-Trees excel at single-dimension queries, but fall apart when you filter on tenant_id and t together. Z-ordering maps both dimensions into one number so that rows close in both dimensions stay close in storage — turning expensive multi-column scans into tight range scans.
Try it: sort by Color, Size, or Z-Order, then run a query and watch how many pages load:
That is exactly what spiral_zorder does in SQL. It encodes both dimensions into a single number so that values close in both tenant and time land close together on the number line — enabling a single BETWEEN lo AND hi index scan to retrieve a 2D rectangle without touching unrelated rows.
-- Create a z-order index on the rollup view
CREATE INDEX ON sensor_data_1m USING btree (
spiral_zorder(spiral(t), ARRAY['sensor_id'])
);
-- Spiral injects this automatically for tenant-scoped queries:
-- WHERE spiral_zorder(spiral(t), ARRAY['sensor_id']) BETWEEN <lo> AND <hi>
spiral_zorder uses FNV-1a hashing — stable and deterministic across restarts, so index values written today still match values computed at query time tomorrow.
Interactive Proof of Locality
Compare how Standard Row-Major and Z-Order Morton handle memory access. Select a range and watch the Page Loads metric.
Prototyping a Custom Table Access Method (TAM)
PostgreSQL 12 decoupled the internal storage from the executor through the Table Access Method (TAM) API. This is where the research got practical. Using Rust and pgrx, I implemented a prototype handler that bypasses the standard Heap.
// src/tam.rs — TAM registration
pub unsafe fn spiral_tam_handler(_fcinfo: pg_sys::FunctionCallInfo) -> pgrx::datum::Internal {
let routine = PgMemoryContexts::TopMemoryContext.palloc_struct::<pg_sys::TableAmRoutine>();
(*routine).type_ = pg_sys::NodeTag::T_TableAmRoutine;
// Scan path — fully implemented
(*routine).scan_begin = Some(spiral_scan_begin);
(*routine).scan_getnextslot = Some(spiral_scan_getnextslot);
(*routine).scan_end = Some(spiral_scan_end);
// Insert path — registered but body is an empty stub; TAM insert is not yet
// implemented. Live writes reach Spiral via standard heap + track_changes_stmt trigger.
(*routine).tuple_insert = Some(spiral_slot_insert);
pgrx::datum::Internal::from(Some(pg_sys::Datum::from(routine as usize)))
}
Binary Packing & Direct Seeks: The Real Implementation
Spiral uses PostgreSQL’s own buffer manager — not flat files. The innovation is computing an exact (block_number, byte_offset) for any (t, tenant_id) pair in O(1), bypassing B-Tree traversal entirely.
Two core #[repr(C)] structures from src/storage.rs:
#[repr(C)]
struct CompressedBlock {
first_val: f64, // 8 bytes — anchor point
data: [u8; 120], // 60 × 2-byte XOR deltas (Gorilla encoding)
} // 128 bytes total → stores 64 time-series points
#[repr(C)]
pub struct SpiralPageOpaque {
pub window_start_t: i64, // time window covered by this page
pub window_end_t: i64,
pub tenant_scale: i32, // number of tenant lanes
pub magic: u32, // 0x50495241 = 'SPRA'
}
The address formula (mode 1 — direct f64, single value column):
Where t_rel = spiral(t) - kickoff_epoch normalizes any timestamptz to a dense integer. The offset then maps to a page via:
// src/storage.rs — exact implementation
const DATA_PER_PAGE: usize = (8192 - 24 - 24) / 8; // = 1018 f64 values per page
pub fn logical_to_physical_offset(logical_offset: i64) -> (u32, u32) {
let index = logical_offset / 8;
let blkno = (index / DATA_PER_PAGE as i64) as u32;
let offset_in_page = (24 + (index % DATA_PER_PAGE as i64) * 8) as u32;
(blkno, offset_in_page)
}
tenant_scale comes from a one-character cardinality hint: 'd'→10, 'h'→100, 'k'→1,000, 'M'→1,000,000, 'B'→1,000,000,000, 'T'→1,000,000,000,000.
Three Storage Modes: Matching Structure to Workload
Spiral ships three packing functions, each trading precision for density. The page map below shows exactly how each fills an 8KB page — and what the savings look like compared to a standard PostgreSQL heap table.
How to read the page map: Each small square = one 8-byte slot. A full page has 1024 slots arranged in a 32×32 grid.
| Cell color | Meaning |
|---|---|
| Gray (top-left 3 cells) | 24-byte PageHeaderData — PostgreSQL page header |
| Dark blue (bottom-right 3 cells) | 24-byte SpiralPageOpaque — Spiral metadata (time window, tenant scale, magic) |
| Rainbow / colored | Data slot occupied by a tenant. Each hue = one tenant lane. With many tenants the hues cycle, producing the rainbow gradient |
| Near-black | Empty or wasted slot (visible in Compact mode where every other slot is overhead) |
The three pages shown side-by-side let you see how the tenant pattern repeats as data fills successive 8KB blocks.
Automating the Pipeline: Hierarchical Rollups
To make the system usable, I wanted to automate the creation of hierarchies. In Spiral, a single table definition with magic comments can generate an entire analytical pipeline:
CREATE TABLE asset_ticks (
t timestamptz NOT NULL,
price double precision, -- Spiral: ohlc, stats, sketch
vol int -- Spiral: sum
);
Manual Control: Custom Hierarchies
While “Magic Comments” are great, power users can manually define rollups using standard Postgres MATERIALIZED VIEW syntax with Spiral options:
CREATE MATERIALIZED VIEW asset_ohlcv_1m
WITH (
spiral.frames = '5m,1h,1d',
spiral.tenant = 'symbol_id'
) AS SELECT ... FROM ticks GROUP BY 1, 2;
Calendar-Aligned Rollup Tiers
Beyond fixed-second intervals, Spiral supports calendar-aligned tiers that snap to month, quarter, and year boundaries — essential for financial and business reporting where “monthly” means January 1st, not “30 days ago.”
CREATE TABLE asset_ticks (
t timestamptz NOT NULL,
symbol_id int NOT NULL,
price double precision, -- Spiral: ohlcv
vol bigint -- Spiral: sum
) WITH (
spiral.frames = '1h,1d,1mon,1qtr,1year',
spiral.tenant = 'symbol_id'
);
Valid calendar suffixes:
| Suffix | Meaning | Snaps to |
|---|---|---|
1mon / 1month |
Calendar month | 1st of each month |
1qtr / 1quarter |
Calendar quarter | Jan 1, Apr 1, Jul 1, Oct 1 |
1year / 1Y |
Calendar year | Jan 1 |
Under the hood, calendar tiers use date_trunc('month', t) (or quarter/year) in the rollup GROUP BY, so they respect timezone-aware truncation. Fixed-second tiers (like 1h, 1d) use epoch arithmetic and are always UTC-aligned.
The Anatomy of a Rollup (Aggregation Inheritance)
How does Spiral know how to roll up your data? It uses deterministic rules for column inheritance:
| Column Suffix | Child Aggregate | Reason |
|---|---|---|
_h / _max |
max() |
Peak is peak. |
_l / _min |
min() |
Low is low. |
_sum / _count |
sum() |
Accumulate totals. |
_sketch |
merge() |
Mathematically unified sketches. |
any timestamptz offset col |
range_max_end() |
Keeps the running max of a time range endpoint — used for open/close windows. |
Incremental Maintenance: Tracking Changes via Changelog
Standard materialized views require a full rebuild. Spiral explores Incremental View Maintenance (IVM) using a transactional changelog. Every update flags a specific time bucket as “dirty.”
A background worker written in Rust monitors this log and performs surgical updates—healing the “orbits” of the spiral without rebuilding the world.
// In src/worker.rs - The Healing Loop
pub fn perform_healing() -> Result<(), pgrx::spi::Error> {
let dirty_buckets = Spi::connect(|client| {
client.select(
"SELECT start_t, end_t FROM spiral.changelog WHERE processed = false",
None, None
)?.map(|row| (row[1].value::<i64>(), row[2].value::<i64>())).collect::<Vec<_>>()
});
for (start, end) in dirty_buckets {
// Surgical rollup for this specific bucket
rollup_bucket(start, end)?;
Spi::run_with_args(
"UPDATE spiral.changelog SET processed = true WHERE start_t = $1 AND end_t = $2",
Some(vec![
(PgBuiltInOids::INT8OID.oid(), start.into_datum()),
(PgBuiltInOids::INT8OID.oid(), end.into_datum())
])
)?;
}
Ok(())
}
The Adaptive Query Slicer
One of the most complex parts of this research was exploring the PostgreSQL Planner Hook. The idea is to query the raw table and have the system automatically “slice” the query between different storage tiers based on data freshness and availability.
Select a time range on the timeline below, showing the last 7 days from past to present. Watch how Spiral would theoretically “slice” your query across storage tiers: Daily, Hourly, and Minutely rollups, with an automatic fallback to Raw Data for segments marked as “dirty” in the changelog. You can also simulate real-time traffic and see the background worker ‘healing’ the orbits.
Live Demo: The Modern Spiral Setup
Let’s see this in action using PostgreSQL’s native WITH syntax.
DROP EXTENSION IF EXISTS spiral CASCADE;
CREATE EXTENSION spiral;
-- The WITH clause tells Spiral to track this table,
-- set up specific rollups, and isolate data by 'sensor_id'.
CREATE TABLE sensor_data (
t timestamptz NOT NULL,
sensor_id int NOT NULL,
temperature double precision, -- Spiral: ohlcv
humidity double precision, -- Spiral: sum
power_usage double precision -- Spiral: stats
) WITH (
spiral.frames = '1m,1h',
spiral.tenant = 'sensor_id'
);
Behind the Scenes: Auto-created Views
Spiral immediately registers the table and creates the hierarchical views.
SELECT view_name, parent_view, frame_seconds, scope_columns
FROM spiral.metadata ORDER BY frame_seconds;
view_name | parent_view | frame_seconds | scope_columns
----------------+----------------+---------------+---------------
sensor_data | BASE | 0 | {sensor_id}
sensor_data_1m | sensor_data | 60 | {sensor_id}
sensor_data_1h | sensor_data_1m | 3600 | {sensor_id}
(3 rows)
Notice it registered the base table and used the magic comments to build the 1m and 1h schema dynamically!
What You Get: SQL From Day One
Once the table is created and the first spiral_refresh runs, you get three things without writing any extra code:
1. Pre-aggregated views at every declared tier:
-- Raw sub-minute ticks
SELECT * FROM sensor_data WHERE sensor_id = 1 AND t >= now() - interval '5m';
-- 1-minute rollup — columns auto-derived from magic comments
SELECT t, sensor_id, temperature_ohlcv_o, temperature_ohlcv_h,
temperature_ohlcv_l, temperature_ohlcv_c, humidity, power_usage_stats
FROM sensor_data_1m
WHERE sensor_id = 1 AND t >= now() - interval '1h';
-- 1-hour rollup for dashboards
SELECT t, sensor_id, temperature_ohlcv_h AS max_temp, humidity
FROM sensor_data_1h
WHERE t >= now() - interval '7d';
2. Transparent query routing — queries against sensor_data automatically rewrite to the right rollup tier.
3. Dirty-aware fallback — during a backfill or late-arriving data window, Spiral serves rollup for clean buckets and raw data for dirty buckets. Zero stale reads, zero full-table scans.
-- This is all you need to keep rollups fresh:
SELECT spiral_refresh('sensor_data');
-- Or scope to one tenant after a targeted backfill:
SELECT spiral_refresh('sensor_data', 'sensor_id = 3');
Data Ingestion spanning Timeframes
Let’s insert some raw data.
INSERT INTO sensor_data (t, sensor_id, temperature, humidity, power_usage) VALUES
('2026-05-03 20:15:00'::timestamptz, 1, 22.5, 45.0, 100.5),
('2026-05-03 20:15:00'::timestamptz, 1, 22.7, 45.2, 101.0),
('2026-05-03 20:15:00'::timestamptz, 2, 19.5, 50.0, 80.0),
('2026-05-03 20:16:00'::timestamptz, 1, 23.0, 44.0, 105.0),
('2026-05-03 20:16:00'::timestamptz, 2, 19.8, 51.0, 82.0),
('2026-05-03 21:15:00'::timestamptz, 1, 25.0, 40.0, 110.0);
Incremental Cascading Refresh
Refreshing the 1-minute rollup incrementally processes the new raw data.
SELECT spiral_refresh('sensor_data');
The column names are dynamically derived from the magic comments:
SELECT t, sensor_id,
temperature_ohlcv_o, temperature_ohlcv_h,
temperature_ohlcv_l, temperature_ohlcv_c,
humidity, power_usage_stats
FROM sensor_data_1m ORDER BY t, sensor_id;
t | sensor_id | temperature_ohlcv_o | temperature_ohlcv_h | temperature_ohlcv_l | temperature_ohlcv_c | humidity | power_usage_stats
------------------------+-----------+---------------------+---------------------+---------------------+---------------------+----------+--------------------------------------------------------------
2026-05-03 20:15:00+00 | 1 | 22.5 | 22.7 | 22.5 | 22.7 | 90.2 | {"n": 2.0, "m1": 100.75, "m2": 0.125, "m3": 0.0, ...}
2026-05-03 20:15:00+00 | 2 | 19.5 | 19.5 | 19.5 | 19.5 | 50 | {"n": 1.0, "m1": 80.0, "m2": 0.0, ...}
2026-05-03 20:16:00+00 | 1 | 23 | 23 | 23 | 23 | 44 | {"n": 1.0, "m1": 105.0, "m2": 0.0, ...}
2026-05-03 20:16:00+00 | 2 | 19.8 | 19.8 | 19.8 | 19.8 | 51 | {"n": 1.0, "m1": 82.0, "m2": 0.0, ...}
2026-05-03 21:15:00+00 | 1 | 25 | 25 | 25 | 25 | 40 | {"n": 1.0, "m1": 110.0, "m2": 0.0, ...}
(5 rows)
humidity is summed per minute bucket. power_usage_stats stores the Welford moments as JSONB — mean (m1), variance accumulator (m2), etc. — ready to merge up to the hourly tier without raw data.
Refresh cascades to the 1-hour rollup automatically because Spiral knows the hierarchy!
Partial Refresh by Tenant Scope
In high-cardinality deployments you often need to refresh only one tenant’s data — e.g., after a backfill for sensor_id = 3 without disturbing others. spiral_refresh accepts an optional WHERE-style scope predicate:
-- Refresh only sensor 3's dirty buckets
SELECT spiral_refresh('sensor_data', 'sensor_id = 3');
Spiral intersects that predicate with the changelog, processes only the matching dirty buckets, and leaves every other tenant’s rollup untouched. The changelog entries for other tenants survive the partial refresh and are picked up on their own schedule.
Transparent Query Acceleration
Spiral intercepts queries to the base table and rewrites them to use the pre-aggregated rollups seamlessly — no query changes needed.
-- You write this:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;
-- Spiral rewrites it to something like:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature_ohlcv_h)
FROM sensor_data_1h
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;
The rewrite selects the coarsest rollup tier whose granularity fits the query’s date_trunc(...). A 4-hour window that matches the 1h tier reads ~4 rows instead of potentially thousands of raw ticks — without any application-level change.
Multi-dimension GROUP BY acceleration. When the query groups by both a tenant column and date_trunc(...), the planner handles both dimensions together — matching the best rollup tier for the granularity, scoped to that tenant.
Z-order index push-down. For tenant-scoped rollup sub-queries, Spiral auto-injects a BETWEEN predicate on the z-order index:
-- Spiral rewrites this internally:
WHERE spiral_zorder(spiral(t), ARRAY['sensor_id'])
BETWEEN <lo> AND <hi>
Z-order is monotone in t for a fixed tenant hash, so the BETWEEN exactly covers that tenant’s time range without scanning other tenants’ rows. This turns full-rollup scans into tight index range scans.
Multi-Table Acceleration: Join Constraint Propagation
The planner hook doesn’t stop at single-table rewrites. When Spiral detects an equijoin on t between two tracked tables, it propagates the time constraint from the constrained side to the unconstrained side, accelerating both simultaneously.
-- Two independent time-series, joined by time
SELECT s.t, s.sensor_id, s.temperature_ohlcv_h,
a.asset_id, a.price_ohlcv_h
FROM sensor_data s
JOIN asset_ticks a ON s.t = a.t
WHERE s.t >= '2026-05-03 19:00:00'::timestamptz
AND s.t < '2026-05-03 23:00:00'::timestamptz;
Without Spiral: asset_ticks has no time predicate — full scan.
With Spiral: the planner walks the JoinTree, detects s.t = a.t, propagates WHERE t >= ... AND t < ... to asset_ticks, then rewrites both sides to their respective rollup tiers:
-- Effective plan:
SELECT s.t, s.sensor_id, s.temperature_ohlcv_h,
a.asset_id, a.price_ohlcv_h
FROM sensor_data_1h s -- ← accelerated
JOIN asset_ticks_1h a ON s.t = a.t -- ← also accelerated via propagation
WHERE s.t >= '2026-05-03 19:00:00'::timestamptz
AND s.t < '2026-05-03 23:00:00'::timestamptz;
Both tables independently benefit from dirty-aware slicing. A dirty bucket in sensor_data does not force a raw scan of asset_ticks — each side is sliced independently against its own changelog.
Handling Late-Arriving Data
Let’s insert an hour of randomized, late data for all tenants.
INSERT INTO sensor_data (t, sensor_id, temperature, humidity, power_usage)
SELECT
'2026-05-03 22:00:00'::timestamptz + (random() * 60 || ' minutes')::interval,
id,
20 + random() * 10,
40 + random() * 20,
90 + random() * 30
FROM generate_series(1, 2) AS id, generate_series(1, 60);
Spiral tracks exactly which time buckets and tenants are “dirty”.
SELECT base_view, t_start, t_end, scope_values
FROM spiral.changelog ORDER BY t_start;
base_view | t_start | t_end | scope_values
-------------+------------+------------+------------------
sensor_data | 1777856400 | 1777856460 | {"sensor_id": 1}
sensor_data | 1777856520 | 1777856580 | {"sensor_id": 2}
sensor_data | 1777856760 | 1777856820 | {"sensor_id": 1}
...
(76 rows)
Each row is one 60-second bucket per tenant. Only the buckets that actually received new data are marked dirty — the rest stay clean and will continue serving from the rollup.
Smart Query Slicing in Action
Spiral’s planner intercepts the query, consults the changelog, and rewrites to a UNION ALL that:
- routes clean hour-aligned buckets to
sensor_data_1h(pre-aggregated, fast) - routes dirty sub-minute windows back to
sensor_data(raw, accurate)
-- You write:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;
-- Spiral actually runs (conceptual rewrite):
SELECT date_trunc('hour', t) AS hour, sensor_id,
max(temperature_ohlcv_h) AS max_temperature
FROM sensor_data_1h -- ← pre-aggregated tier
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 22:00:00'::timestamptz -- 3 clean hours
AND spiral_zorder(spiral(t), ARRAY['sensor_id'])
BETWEEN 1777843200000000 AND 1777857600999999
GROUP BY 1, 2
UNION ALL
SELECT date_trunc('hour', t) AS hour, sensor_id,
max(temperature) AS max_temperature
FROM sensor_data -- ← raw tier (dirty window)
WHERE t >= '2026-05-03 22:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz -- 1 dirty hour
GROUP BY 1, 2;
The key insight: Spiral never re-aggregates clean data. The 3 clean hours each contribute exactly 1 row from sensor_data_1h. Only the dirty hour scans raw rows. As dirty buckets are healed, they graduate back to the rollup tier automatically.
Tier selection rules:
- A bucket is eligible for a rollup tier if it is aligned to that tier’s frame and clean in the changelog.
- The coarsest aligned tier wins (1h beats 1m, 1m beats raw).
- Adjacent segments from the same source are merged into one range scan to minimize plan nodes.
You can force EXPLAIN to show the rewrite:
EXPLAIN (VERBOSE)
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;
The plan will show a HashAggregate over an Append (the UNION ALL) with separate SeqScan or IndexScan nodes — one per segment. After spiral_refresh, all four hours are clean and the plan collapses to a single scan of sensor_data_1h.
Healing the Orbits & Handling Deletions
We heal the dirty buckets. It only processes the changes!
SELECT spiral_refresh('sensor_data');
The entire time range is now clean. The same query now generates a single-tier plan:
-- After refresh, Spiral rewrites to a single rollup scan:
SELECT date_trunc('hour', t) AS hour, sensor_id,
max(temperature_ohlcv_h) AS max_temperature
FROM sensor_data_1h
WHERE t >= '2026-05-03 19:00:00'::timestamptz
AND t < '2026-05-03 23:00:00'::timestamptz
AND spiral_zorder(spiral(t), ARRAY['sensor_id'])
BETWEEN 1777843200000000 AND 1777857600999999
GROUP BY 1, 2;
-- 4 rows read instead of thousands of raw ticks.
If we delete data, it isolates the dirty fallback to ONLY the affected tenant (sensor_id = 1) for that specific time bucket — other tenants’ rollups stay pristine.
DELETE FROM sensor_data
WHERE sensor_id = 1
AND t >= '2026-05-03 20:15:00'::timestamptz
AND t < '2026-05-03 20:25:00'::timestamptz;
SELECT base_view, t_start, t_end, scope_values
FROM spiral.changelog ORDER BY t_start;
base_view | t_start | t_end | scope_values
-------------+------------+------------+------------------
sensor_data | 1777850100 | 1777850160 | {"sensor_id": 1}
sensor_data | 1777850160 | 1777850220 | {"sensor_id": 1}
(2 rows)
Only sensor_id = 1 is dirty. sensor_id = 2’s rollups are completely untouched — a targeted dirty mark, not a full-table invalidation.
You can inspect the lag at any time:
SELECT * FROM spiral.status;
base_view | tier_count | dirty_entries | dirty_scopes | dirty_seconds | oldest_dirty_ts | newest_dirty_ts | lag
-------------+------------+---------------+--------------+---------------+------------------------+------------------------+-------------------------
sensor_data | 2 | 2 | 1 | 120 | 2026-05-03 20:15:00+00 | 2026-05-03 20:17:00+00 | 21 days 01:23:37
(1 row)
lag tells you how stale your oldest dirty bucket is.
In a multi-tenant deployment, spiral.status aggregates across all tenants. For per-tenant visibility, spiral.scope_status breaks it down:
SELECT * FROM spiral.scope_status ORDER BY lag DESC;
base_view | scope_values | dirty_entries | dirty_seconds | oldest_dirty_ts | lag
-------------+--------------------+---------------+---------------+------------------------+--------------------
sensor_data | {"sensor_id": "1"} | 12 | 720 | 2026-05-03 20:15:00+00 | 21 days 01:23:37
sensor_data | {"sensor_id": "2"} | 3 | 180 | 2026-05-03 22:10:00+00 | 20 days 23:05:12
(2 rows)
Sensor 1 has more dirty buckets than sensor 2 — a targeted spiral_refresh(‘sensor_data’, ‘sensor_id = 1’) heals only that tenant’s backlog without touching sensor 2.
For a single-number health check, spiral_lag() returns the lag as an interval:
SELECT spiral_lag(‘sensor_data’);
-- → 21 days 01:23:37 (NULL when fully current)
Pipe this into any monitoring system. NULL means clean; a growing interval means the worker is falling behind.
Autonomous Background Worker
You don’t need to call spiral_refresh manually. When spiral.frames is set, Spiral registers a background worker that wakes every second, claims dirty (base_view, scope) pairs from the changelog, and heals them — automatically.
-- postgresql.conf (or SET for session)
spiral.worker_enabled = on -- default: on
spiral.max_workers = 2 -- parallel workers per DB
spiral.worker_batch_size = 10 -- scopes refreshed per 1s tick
Workers use advisory locks to divide scopes without conflicts: worker A refreshes sensor_id = 1, worker B refreshes sensor_id = 2 — no coordination overhead, no duplicate work.
The planner hook is similarly tunable:
spiral.enable_planner_hook = on -- disable to compare plans
spiral.planner_max_segments = 50 -- fall back to RAW if UNION ALL would exceed N parts
When planner_max_segments is exceeded (highly fragmented dirty ranges), Spiral falls back to a full raw scan and logs a notice — trading query rewrite overhead for a simpler plan.
Mathematical Engine: Welford’s Algorithm
The Problem with Standard stddev()
PostgreSQL’s built-in stddev() cannot be rolled up. Given two pre-aggregated hourly stddev values you cannot compute the combined stddev without the original rows. This breaks the rollup chain.
Spiral needs a mergeable statistical state — one where rollup(rollup(raw → 1m) → 1h) gives the same answer as rollup(raw → 1h).
The Solution: Store Moments, Not Final Values
Welford’s online algorithm accumulates four central moments (n, m1, m2, m3, m4) rather than a computed stddev. From those moments you can derive mean, variance, stddev, skewness, and kurtosis at any time. Crucially, two StatsState structs can be merged exactly using Chan et al.’s parallel algorithm — no raw data needed.
The Rust Implementation (src/stats.rs)
// src/stats.rs
#[derive(Serialize, Deserialize, Default, Clone, Copy)]
pub struct StatsState {
pub n: f64, // count
pub m1: f64, // mean
pub m2: f64, // sum of squared deviations (→ variance)
pub m3: f64, // sum of cubed deviations (→ skewness)
pub m4: f64, // sum of quartic deviations (→ kurtosis)
}
impl StatsState {
pub fn add(&mut self, x: f64) {
let n1 = self.n;
self.n += 1.0;
let delta = x - self.m1;
let delta_n = delta / self.n;
let term1 = delta * delta_n * n1;
self.m1 += delta_n;
self.m2 += term1;
// m3 and m4 updated similarly — O(1) per value, numerically stable
}
// Chan’s parallel algorithm: merge two independent StatsState structs
// Used when rolling up 1m → 1h → 1d without re-scanning raw rows.
pub fn merge(&mut self, other: &Self) {
let combined_n = self.n + other.n;
let delta = other.m1 - self.m1;
let delta2 = delta * delta;
let m1 = (self.n * self.m1 + other.n * other.m1) / combined_n;
let m2 = self.m2 + other.m2 + delta2 * self.n * other.n / combined_n;
// m3, m4 follow the same pattern with higher-order delta terms
self.n = combined_n; self.m1 = m1; self.m2 = m2; /* ... */
}
pub fn mean(&self) -> f64 { self.m1 }
pub fn variance(&self) -> f64 { self.m2 / (self.n - 1.0) }
pub fn stddev(&self) -> f64 { self.variance().sqrt() }
pub fn skewness(&self) -> f64 { self.n.sqrt() * self.m3 / self.m2.powf(1.5) }
pub fn kurtosis(&self) -> f64 { self.n * self.m4 / (self.m2 * self.m2) - 3.0 }
}
How pgrx Hooks Connect Rust to PostgreSQL Aggregates
pgrx lets Rust functions register directly as PostgreSQL functions via the #[pg_extern] attribute. Spiral uses two functions to form a full PostgreSQL aggregate:
// Accumulation step — called once per raw row during spiral_refresh
#[pg_extern(immutable, parallel_safe)]
pub fn spiral_stats_accum(state: Option<pgrx::JsonB>, val: f64) -> pgrx::JsonB {
let mut s = state
.map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
.unwrap_or_default();
s.add(val);
pgrx::JsonB(serde_json::to_value(s).unwrap())
}
// Combine step — called when merging two rollup tiers (1m → 1h)
// `parallel_safe` tells PostgreSQL this can run across parallel workers
#[pg_extern(immutable, parallel_safe)]
pub fn spiral_stats_combine(
state1: Option<pgrx::JsonB>,
state2: Option<pgrx::JsonB>,
) -> pgrx::JsonB {
let mut s1 = state1.map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
.unwrap_or_default();
let s2 = state2.map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
.unwrap_or_default();
s1.merge(&s2);
pgrx::JsonB(serde_json::to_value(s1).unwrap())
}
Why this fits the rollup architecture perfectly:
| Property | Why it matters |
|---|---|
| Online — processes one value at a time | Matches IVM: only new changelog rows need processing, not full re-scan |
| Mergeable — two states combine exactly | Enables 1m → 1h → 1d rollup chains without raw data |
| Numerically stable — Welford avoids catastrophic cancellation | Large sensor datasets with tight value ranges stay accurate |
parallel_safe — PostgreSQL can split across workers |
Rollup refreshes parallelise automatically at no cost |
| JSONB state — opaque to PostgreSQL | Schema-free; adding m5 or percentiles needs no DDL change |
The -- Spiral: stats magic comment wires this up: when building a rollup view Spiral emits spiral_stats_accum(col) for the 1m layer and spiral_stats_combine(col_stats) for every higher tier.
Approximate Quantiles: Sketch and T-Digest
Welford gives exact moments (mean, stddev, skewness). But percentile_cont(0.95) cannot be rolled up — there is no exact mergeable form. Spiral provides two mergeable approximate quantile sketches instead.
| Magic comment | Aggregate | Merge fn | Accuracy |
|---|---|---|---|
-- Spiral: sketch |
spiral_sketch(col) |
spiral_sketch_merge(col_sketch) |
DDSketch, relative error ≤ 1% |
-- Spiral: tdigest |
spiral_tdigest(col) |
spiral_tdigest_merge(col_tdigest) |
t-digest, better tail accuracy |
Like stats, both store a JSONB blob at the 1m tier and merge it exactly as it rolls up to 1h, 1d — no raw data needed.
CREATE TABLE api_requests (
t timestamptz NOT NULL,
service_id int NOT NULL,
latency_ms double precision -- Spiral: tdigest
) WITH (
spiral.frames = '1m,1h,1d',
spiral.tenant = 'service_id'
);
After refresh, the rollup stores latency_ms_tdigest at every tier. Query approximate percentiles at any granularity:
-- p50, p95, p99 from the hourly rollup — no raw rows needed
SELECT t, service_id,
spiral_tdigest_quantile(latency_ms_tdigest, 0.50) AS p50,
spiral_tdigest_quantile(latency_ms_tdigest, 0.95) AS p95,
spiral_tdigest_quantile(latency_ms_tdigest, 0.99) AS p99
FROM api_requests_1h
WHERE t >= now() - interval '24h'
ORDER BY t, service_id;
This is the key advantage over standard PostgreSQL: a 1h bucket carries a compact sketch (~1 KB) that can answer any quantile query, and two hourly sketches merge into a daily sketch in microseconds. No raw rows ever need to be re-read.
Z-Order Indexing for Existing Tables
Spiral’s WITH (spiral.frames=...) wires everything automatically for new tables. For an existing table without Spiral TAM, cluster_table() creates the z-order index in one call:
-- Add z-order index to any table:
SELECT cluster_table('sensor_data', 't', ARRAY['sensor_id']);
-- Creates: idx_z_sensor_data ON sensor_data
-- USING btree (spiral_zorder(spiral(t), ARRAY['sensor_id']))
The planner hook then uses this index for BETWEEN pushdown on any query that filters by both t and sensor_id. No data migration, no DDL changes to existing schemas.
The spiral(t) function used inside the index is an identity for bigint and extracts the microsecond epoch for timestamptz. It exists so the index expression is stable regardless of which type t carries:
SELECT spiral('2026-05-03 20:15:00'::timestamptz);
-- → 1746303300000000 (microseconds since Unix epoch)
SELECT to_timestamptz(1746303300000000);
-- → 2026-05-03 20:15:00+00
Round-tripping through spiral() / to_timestamptz() lets you embed raw epoch arithmetic in SQL when needed — for example, to manually verify a z-order range or debug a planner rewrite.
Experimental Results: Prototype Benchmarks
In my local environment, with a 10 million row dataset, I saw results that justified continuing this research.
| Phase | Ingest Rate | Duration |
|---|---|---|
| Bulk Load | 1,940,482 rows/s | 0.51s (1M sample) |
| Backfill | 302,413 rows/s | 3.30s (1M sample) |
Building PostgreSQL Extensions in Rust: What I Learned
Spiral is built with pgrx — a framework that lets you write PostgreSQL extensions entirely in Rust. If you have ever wanted to extend PostgreSQL but found C intimidating, pgrx is worth serious attention.
The API Surface Spiral Uses
Every major PostgreSQL extension mechanism is available through pgrx:
| Mechanism | What it does | How Spiral uses it |
|---|---|---|
#[pg_extern] |
Export a Rust fn as a SQL function | spiral_refresh, spiral_zorder, spiral_lag |
extension_sql! |
Embed raw SQL in the extension | spiral.changelog, spiral.status, operators |
#[pg_aggregate] / CREATE AGGREGATE |
Custom aggregates | spiral_stats_accum/combine, sketch/tdigest |
TableAmRoutine (TAM) |
Custom storage engine | Binary-packed page layout, O(1) seeks |
planner_hook |
Intercept & rewrite query plans | UNION ALL rewrites, join propagation, z-order pushdown |
BackgroundWorker |
Autonomous server processes | 1-second changelog poller + scope-affinity refresh |
GucRegistry |
Custom postgresql.conf settings |
spiral.worker_enabled, spiral.max_workers, etc. |
Spi |
Execute SQL from Rust | Changelog queries, metadata lookups, rollup SQL emission |
What pgrx Makes Easy
Zero-cost type mapping. Rust f64 maps to PostgreSQL float8, i64 to bigint, Vec<Option<String>> to text[]. No manual Datum casting needed.
Memory safety in unsafe territory. PostgreSQL’s C API is deeply unsafe — raw pointers to pg_sys::Query, pg_sys::PlannedStmt, page buffers. pgrx wraps enough to make the common paths safe, while letting you drop to unsafe where needed (TAM scan callbacks, page manipulation).
Aggregate functions with combine support. The parallel_safe attribute on #[pg_extern] tells PostgreSQL the function can run across parallel workers — enabling automatic parallelism for rollup refreshes at no extra code cost.
What is Still Hard
The planner hook is raw C. planner_hook_type takes an unsafe extern "C-unwind" fn. Traversing pg_sys::Query, pg_sys::RangeTblEntry, and the JoinTree requires pointer arithmetic and intimate knowledge of PostgreSQL internals. pgrx does not abstract this — you read the PostgreSQL source to understand the node shapes.
TAM callbacks must not panic. Any Rust panic inside a TAM callback (scan_getnextslot, tuple_insert) crashes the backend. You must catch all errors before they unwind past the C ABI boundary.
Testing requires a running PostgreSQL. #[pg_test] spins up a real Postgres instance per test run. The setup is automatic with pgrx but CI times are longer than unit tests — integration is the only meaningful test boundary.
Starting Point for Your Own Extension
If you want to build a PostgreSQL extension in Rust:
cargo install cargo-pgrx
cargo pgrx new my_extension
cargo pgrx run pg17 # drops into psql with your extension loaded
The pgrx examples cover custom types, aggregates, operators, and background workers. Spiral’s source is also heavily commented — the TAM implementation in src/tam.rs and the planner hook in src/hooks.rs are good starting points for those two harder topics.
The PostgreSQL extension ecosystem needs more Rust. The barrier is lower than it looks.
Open for Collaboration
Building Spiral has been a rewarding learning experience. This is a work in progress. If you are a PostgreSQL internals expert, a Rustacean, or someone who loves high-performance storage, I invite you to contribute.
Spiral is open source. Check it out on GitHub and let’s explore the future of storage together.