A Personal Journey into Database Internals

I should start with a confession: I am not a math expert, nor am I a core PostgreSQL developer. I am a curious engineer who loves to experiment. Spiral is the result of letting my creative side take the wheel—a research project born from a simple desire to understand how databases work at the bit-level and to see if I could implement some of my own “crazy” ideas.

Everything you see here is a prototype. It is an exploration of what becomes possible when we use Rust to extend the heart of the database. I’m sharing this as a research outcome, hoping to find other curious minds who might want to contribute or challenge these concepts.

Try It in Five Minutes

Spiral targets PostgreSQL 16–18. Install via Homebrew on macOS:

brew install postgresql@18
brew install --build-from-source ./Formula/spiral.rb

Or build from source with cargo-pgrx:

cargo install cargo-pgrx
cargo pgrx init
cargo pgrx run pg18   # drops into a psql session with spiral loaded

Add to postgresql.conf so the planner hook and background worker start at boot:

shared_preload_libraries = 'spiral'

Then try the short walkthrough:

cargo pgrx run pg18 < examples/short_walkthrough.sql

Or paste this into any psql session:

CREATE EXTENSION spiral;
SET spiral.kickoff_date = '2026-01-01';

CREATE TABLE ticks (
    t         timestamptz NOT NULL,
    symbol_id int         NOT NULL,
    price     double precision, -- Spiral: ohlcv
    vol       int               -- Spiral: sum
) WITH (spiral.frames = '1m,1h', spiral.tenant = 'symbol_id');

-- Insert some data — background worker auto-refreshes rollups
INSERT INTO ticks SELECT
    now() - (random() * interval '2 hours'),
    (random() * 10)::int,
    100 + random() * 50,
    (random() * 1000)::int
FROM generate_series(1, 100000);

-- Manually refresh (or just wait for the bgworker)
SELECT spiral_refresh('ticks');

-- Query the 1-minute rollup
SELECT t, symbol_id, price_ohlcv_h, price_ohlcv_l, vol
FROM ticks_1m ORDER BY t DESC LIMIT 10;

The source is at github.com/jonatas/spiral.

The “Spiral” Concept: A Metaphor for Data Flow

The central idea behind this project is to stop thinking of data as a flat, linear history. As datasets grow to billions of rows, the traditional model faces what I call Data Gravity—where the weight of the data makes every operation exponentially harder.

In my experiments, I started imagining storage as a Spiral.

In this model, fresh data arrives at high velocity in the center. As it “cools” and ages, it migrates to outer orbits where it’s aggregated into stable, dense structures.

Multi-Tenancy: The Lane System

In a multi-tenant environment, the “Data Gravity” problem is multiplied. If we store everyone’s data in the same linear file, noisy neighbors can drown out everyone else.

Spiral solves this by assigning each tenant their own Lane within the spiral. Imagine a cosmic highway where each tenant has their own orbital path. This preserves locality not just in time, but in tenant identity. You can explore this concept in depth at the Spiral Interactive Lab.

      [01] THE SINGLE TENANT FLOW

      Data originates at the high-velocity center (Hot Storage). As it ages, the algorithm pushes it outward, creating a natural temporal gradient.
    
      [02] OPENING THE LANES

      The space is mathematically partitioned into lanes. Each lane represents a different aggregation level (Raw -> 1m -> 1h -> 1d).
    
      [03] SCALABLE MULTI-TENANCY

      By shifting the orbital phase, we stack multiple tenants into the same table. They share the same physical "lane" boundaries but never collide in bit-space.

The Geometry of Packing: Massively Multi-Tenant Scale

To understand how Spiral achieves constant-time seeks across billions of rows, we have to look at the Packing Geometry. By using an Archimedean spiral, we can map a 1D sequence (time) into a 2D plane where distance from the center represents “age” and the rotation phase represents “tenant identity.”

In this advanced simulator, you can push the system to 100 concurrent tenants. Watch how the Temporal Resolution (Step) and Packing Density configs mirror the internal bundle_size and frame_seconds parameters.

POSTGRESQL_DDL.SQL
ALTER TABLE sensor_data SET (
spiral.tenant = 'sensor_id',-- Scale: 50 tenants

        -- bundle_size: 100 pts/block (internal — not a DDL option)
        
spiral.frames = '1m,1h,1d'-- visual lane width: 20 (simulation)
);
QUERY_ENGINE

        WHERE t >= 20%
        
        AND t < 50%

[RUNNING] Calculating offsets...
ARCHIMEDEAN_V2.0_MORTON_OPT

The Engineering Problem: Entropy and Page Overhead

PostgreSQL organizes data into fixed-size 8KB Pages. For analytical queries on massive time-series data, this creates a significant “IO Tax.”

graph TD subgraph "Spiral 8KB Page — src/storage.rs" H["PageHeaderData · 24 bytes"] --> DA DA["Data Area · 1018 × 8-byte slots
(8192 − 24 − 24) / 8 = 1018 f64 values"] --> OP OP["SpiralPageOpaque · 24 bytes
window_start_t · window_end_t
tenant_scale · magic 0x50495241"] end subgraph "Standard Heap 8KB Page" PH["PageHeaderData · 24 bytes"] --> LP LP["Line Pointers array"] --> T1 LP --> T2 T1["Tuple 1 (header + columns)"] --> FS T2["Tuple 2"] --> FS FS["Free Space (variable)"] end

The B-Tree Trap in Multidimensional Queries

B-Trees are the workhorse of Postgres, but they are fundamentally one-dimensional. When you create a composite index on (tenant_id, time), you are prioritizing one dimension over the other.

Query A (WHERE tenant_id = 1 AND t > ...): Fast.
Query B (WHERE t > ... across multiple tenants): Slow.

How Spiral Differs from Existing Solutions

Several mature tools address time-series at scale. Spiral explores a different set of trade-offs.

	TimescaleDB	pg_partman	Spiral
Storage	Hypertable chunks	Native partitions	Custom TAM + standard heap
Rollups	Continuous aggregates (explicit)	None	Auto-derived from magic comments
Query rewrite	Manual view queries	None	Transparent planner hook
Dirty-aware slicing	No	No	Yes — clean → rollup, dirty → raw
Mergeable stats	Limited	None	Welford moments + sketch/tdigest
Multi-tenant isolation	Schema-per-tenant or manual	Manual	Built-in `spiral.tenant` lane
Background maintenance	Yes (jobs)	Yes (cron)	Yes (autonomous bgworker, 1s tick)
Extension type	C + custom storage	Pure SQL	Rust (pgrx) + TAM + planner hook

The key architectural difference: you never change your query. TimescaleDB continuous aggregates are separate views that you must remember to query. Spiral intercepts the original query and routes it transparently — the application code never changes when a new rollup tier is added or a bucket becomes dirty.

Geometry-Aware Indexing with Z-Order

B-Trees excel at single-dimension queries, but fall apart when you filter on tenant_id and t together. Z-ordering maps both dimensions into one number so that rows close in both dimensions stay close in storage — turning expensive multi-column scans into tight range scans.

Try it: sort by Color, Size, or Z-Order, then run a query and watch how many pages load:

That is exactly what spiral_zorder does in SQL. It encodes both dimensions into a single number so that values close in both tenant and time land close together on the number line — enabling a single BETWEEN lo AND hi index scan to retrieve a 2D rectangle without touching unrelated rows.

-- Create a z-order index on the rollup view
CREATE INDEX ON sensor_data_1m USING btree (
    spiral_zorder(spiral(t), ARRAY['sensor_id'])
);

-- Spiral injects this automatically for tenant-scoped queries:
-- WHERE spiral_zorder(spiral(t), ARRAY['sensor_id']) BETWEEN <lo> AND <hi>

spiral_zorder uses FNV-1a hashing — stable and deterministic across restarts, so index values written today still match values computed at query time tomorrow.

Interactive Proof of Locality

Compare how Standard Row-Major and Z-Order Morton handle memory access. Select a range and watch the Page Loads metric.

Prototyping a Custom Table Access Method (TAM)

PostgreSQL 12 decoupled the internal storage from the executor through the Table Access Method (TAM) API. This is where the research got practical. Using Rust and pgrx, I implemented a prototype handler that bypasses the standard Heap.

// src/tam.rs — TAM registration
pub unsafe fn spiral_tam_handler(_fcinfo: pg_sys::FunctionCallInfo) -> pgrx::datum::Internal {
    let routine = PgMemoryContexts::TopMemoryContext.palloc_struct::<pg_sys::TableAmRoutine>();
    (*routine).type_ = pg_sys::NodeTag::T_TableAmRoutine;

    // Scan path — fully implemented
    (*routine).scan_begin       = Some(spiral_scan_begin);
    (*routine).scan_getnextslot = Some(spiral_scan_getnextslot);
    (*routine).scan_end         = Some(spiral_scan_end);

    // Insert path — registered but body is an empty stub; TAM insert is not yet
    // implemented. Live writes reach Spiral via standard heap + track_changes_stmt trigger.
    (*routine).tuple_insert = Some(spiral_slot_insert);

    pgrx::datum::Internal::from(Some(pg_sys::Datum::from(routine as usize)))
}

Binary Packing & Direct Seeks: The Real Implementation

Spiral uses PostgreSQL’s own buffer manager — not flat files. The innovation is computing an exact (block_number, byte_offset) for any (t, tenant_id) pair in O(1), bypassing B-Tree traversal entirely.

Two core #[repr(C)] structures from src/storage.rs:

#[repr(C)]
struct CompressedBlock {
    first_val: f64,       // 8 bytes — anchor point
    data: [u8; 120],      // 60 × 2-byte XOR deltas (Gorilla encoding)
}                         // 128 bytes total → stores 64 time-series points

#[repr(C)]
pub struct SpiralPageOpaque {
    pub window_start_t: i64,  // time window covered by this page
    pub window_end_t:   i64,
    pub tenant_scale:   i32,  // number of tenant lanes
    pub magic:          u32,  // 0x50495241 = 'SPRA'
}

The address formula (mode 1 — direct f64, single value column):

\[\text{logical\_offset} = (t_{\text{rel}} \times \text{tenant\_scale} \times 8) + (\text{tenant\_id} \times 8)\]

Where t_rel = spiral(t) - kickoff_epoch normalizes any timestamptz to a dense integer. The offset then maps to a page via:

// src/storage.rs — exact implementation
const DATA_PER_PAGE: usize = (8192 - 24 - 24) / 8; // = 1018 f64 values per page

pub fn logical_to_physical_offset(logical_offset: i64) -> (u32, u32) {
    let index          = logical_offset / 8;
    let blkno          = (index / DATA_PER_PAGE as i64) as u32;
    let offset_in_page = (24 + (index % DATA_PER_PAGE as i64) * 8) as u32;
    (blkno, offset_in_page)
}

tenant_scale comes from a one-character cardinality hint: 'd'→10, 'h'→100, 'k'→1,000, 'M'→1,000,000, 'B'→1,000,000,000, 'T'→1,000,000,000,000.

Three Storage Modes: Matching Structure to Workload

Spiral ships three packing functions, each trading precision for density. The page map below shows exactly how each fills an 8KB page — and what the savings look like compared to a standard PostgreSQL heap table.

How to read the page map: Each small square = one 8-byte slot. A full page has 1024 slots arranged in a 32×32 grid.

Cell color	Meaning
Gray (top-left 3 cells)	24-byte `PageHeaderData` — PostgreSQL page header
Dark blue (bottom-right 3 cells)	24-byte `SpiralPageOpaque` — Spiral metadata (time window, tenant scale, magic)
Rainbow / colored	Data slot occupied by a tenant. Each hue = one tenant lane. With many tenants the hues cycle, producing the rainbow gradient
Near-black	Empty or wasted slot (visible in Compact mode where every other slot is overhead)

The three pages shown side-by-side let you see how the tenant pattern repeats as data fills successive 8KB blocks.

Automating the Pipeline: Hierarchical Rollups

To make the system usable, I wanted to automate the creation of hierarchies. In Spiral, a single table definition with magic comments can generate an entire analytical pipeline:

CREATE TABLE asset_ticks (
    t timestamptz NOT NULL,
    price double precision, -- Spiral: ohlc, stats, sketch
    vol int                 -- Spiral: sum
);

Manual Control: Custom Hierarchies

While “Magic Comments” are great, power users can manually define rollups using standard Postgres MATERIALIZED VIEW syntax with Spiral options:

CREATE MATERIALIZED VIEW asset_ohlcv_1m 
WITH (
    spiral.frames = '5m,1h,1d',
    spiral.tenant = 'symbol_id'
) AS SELECT ... FROM ticks GROUP BY 1, 2;

Calendar-Aligned Rollup Tiers

Beyond fixed-second intervals, Spiral supports calendar-aligned tiers that snap to month, quarter, and year boundaries — essential for financial and business reporting where “monthly” means January 1st, not “30 days ago.”

CREATE TABLE asset_ticks (
    t         timestamptz NOT NULL,
    symbol_id int         NOT NULL,
    price     double precision, -- Spiral: ohlcv
    vol       bigint            -- Spiral: sum
) WITH (
    spiral.frames = '1h,1d,1mon,1qtr,1year',
    spiral.tenant = 'symbol_id'
);

Valid calendar suffixes:

Suffix	Meaning	Snaps to
`1mon` / `1month`	Calendar month	1st of each month
`1qtr` / `1quarter`	Calendar quarter	Jan 1, Apr 1, Jul 1, Oct 1
`1year` / `1Y`	Calendar year	Jan 1

Under the hood, calendar tiers use date_trunc('month', t) (or quarter/year) in the rollup GROUP BY, so they respect timezone-aware truncation. Fixed-second tiers (like 1h, 1d) use epoch arithmetic and are always UTC-aligned.

The Anatomy of a Rollup (Aggregation Inheritance)

How does Spiral know how to roll up your data? It uses deterministic rules for column inheritance:

Column Suffix	Child Aggregate	Reason
`_h` / `_max`	`max()`	Peak is peak.
`_l` / `_min`	`min()`	Low is low.
`_sum` / `_count`	`sum()`	Accumulate totals.
`_sketch`	`merge()`	Mathematically unified sketches.
any `timestamptz` offset col	`range_max_end()`	Keeps the running max of a time range endpoint — used for open/close windows.

Incremental Maintenance: Tracking Changes via Changelog

Standard materialized views require a full rebuild. Spiral explores Incremental View Maintenance (IVM) using a transactional changelog. Every update flags a specific time bucket as “dirty.”

graph LR A[INSERT/UPDATE] --> B[Trigger] B --> C[(spiral.changelog)] C --> D[Background Worker] D --> E[Surgical Patch]

A background worker written in Rust monitors this log and performs surgical updates—healing the “orbits” of the spiral without rebuilding the world.

// In src/worker.rs - The Healing Loop
pub fn perform_healing() -> Result<(), pgrx::spi::Error> {
    let dirty_buckets = Spi::connect(|client| {
        client.select(
            "SELECT start_t, end_t FROM spiral.changelog WHERE processed = false",
            None, None
        )?.map(|row| (row[1].value::<i64>(), row[2].value::<i64>())).collect::<Vec<_>>()
    });

    for (start, end) in dirty_buckets {
        // Surgical rollup for this specific bucket
        rollup_bucket(start, end)?;
        
        Spi::run_with_args(
            "UPDATE spiral.changelog SET processed = true WHERE start_t = $1 AND end_t = $2",
            Some(vec![
                (PgBuiltInOids::INT8OID.oid(), start.into_datum()),
                (PgBuiltInOids::INT8OID.oid(), end.into_datum())
            ])
        )?;
    }
    Ok(())
}

The Adaptive Query Slicer

One of the most complex parts of this research was exploring the PostgreSQL Planner Hook. The idea is to query the raw table and have the system automatically “slice” the query between different storage tiers based on data freshness and availability.

Select a time range on the timeline below, showing the last 7 days from past to present. Watch how Spiral would theoretically “slice” your query across storage tiers: Daily, Hourly, and Minutely rollups, with an automatic fallback to Raw Data for segments marked as “dirty” in the changelog. You can also simulate real-time traffic and see the background worker ‘healing’ the orbits.

Live Demo: The Modern Spiral Setup

Let’s see this in action using PostgreSQL’s native WITH syntax.

DROP EXTENSION IF EXISTS spiral CASCADE;
CREATE EXTENSION spiral;

-- The WITH clause tells Spiral to track this table,
-- set up specific rollups, and isolate data by 'sensor_id'.
CREATE TABLE sensor_data (
    t timestamptz NOT NULL,
    sensor_id int NOT NULL,
    temperature double precision, -- Spiral: ohlcv
    humidity double precision,    -- Spiral: sum
    power_usage double precision  -- Spiral: stats
) WITH (
    spiral.frames = '1m,1h',
    spiral.tenant = 'sensor_id'
);

Behind the Scenes: Auto-created Views

Spiral immediately registers the table and creates the hierarchical views.

SELECT view_name, parent_view, frame_seconds, scope_columns 
FROM spiral.metadata ORDER BY frame_seconds;

   view_name    |  parent_view   | frame_seconds | scope_columns 
----------------+----------------+---------------+---------------
 sensor_data    | BASE           |             0 | {sensor_id}
 sensor_data_1m | sensor_data    |            60 | {sensor_id}
 sensor_data_1h | sensor_data_1m |          3600 | {sensor_id}
(3 rows)

Notice it registered the base table and used the magic comments to build the 1m and 1h schema dynamically!

What You Get: SQL From Day One

Once the table is created and the first spiral_refresh runs, you get three things without writing any extra code:

1. Pre-aggregated views at every declared tier:

-- Raw sub-minute ticks
SELECT * FROM sensor_data WHERE sensor_id = 1 AND t >= now() - interval '5m';

-- 1-minute rollup — columns auto-derived from magic comments
SELECT t, sensor_id, temperature_ohlcv_o, temperature_ohlcv_h,
       temperature_ohlcv_l, temperature_ohlcv_c, humidity, power_usage_stats
FROM sensor_data_1m
WHERE sensor_id = 1 AND t >= now() - interval '1h';

-- 1-hour rollup for dashboards
SELECT t, sensor_id, temperature_ohlcv_h AS max_temp, humidity
FROM sensor_data_1h
WHERE t >= now() - interval '7d';

2. Transparent query routing — queries against sensor_data automatically rewrite to the right rollup tier.

3. Dirty-aware fallback — during a backfill or late-arriving data window, Spiral serves rollup for clean buckets and raw data for dirty buckets. Zero stale reads, zero full-table scans.

-- This is all you need to keep rollups fresh:
SELECT spiral_refresh('sensor_data');

-- Or scope to one tenant after a targeted backfill:
SELECT spiral_refresh('sensor_data', 'sensor_id = 3');

Data Ingestion spanning Timeframes

Let’s insert some raw data.

INSERT INTO sensor_data (t, sensor_id, temperature, humidity, power_usage) VALUES
('2026-05-03 20:15:00'::timestamptz, 1, 22.5, 45.0, 100.5),
('2026-05-03 20:15:00'::timestamptz, 1, 22.7, 45.2, 101.0),
('2026-05-03 20:15:00'::timestamptz, 2, 19.5, 50.0, 80.0),
('2026-05-03 20:16:00'::timestamptz, 1, 23.0, 44.0, 105.0),
('2026-05-03 20:16:00'::timestamptz, 2, 19.8, 51.0, 82.0),
('2026-05-03 21:15:00'::timestamptz, 1, 25.0, 40.0, 110.0);

Incremental Cascading Refresh

Refreshing the 1-minute rollup incrementally processes the new raw data.

SELECT spiral_refresh('sensor_data');

The column names are dynamically derived from the magic comments:

SELECT t, sensor_id, 
       temperature_ohlcv_o, temperature_ohlcv_h, 
       temperature_ohlcv_l, temperature_ohlcv_c,
       humidity, power_usage_stats
FROM sensor_data_1m ORDER BY t, sensor_id;

           t            | sensor_id | temperature_ohlcv_o | temperature_ohlcv_h | temperature_ohlcv_l | temperature_ohlcv_c | humidity |                              power_usage_stats
------------------------+-----------+---------------------+---------------------+---------------------+---------------------+----------+--------------------------------------------------------------
 2026-05-03 20:15:00+00 |         1 |                22.5 |                22.7 |                22.5 |                22.7 |     90.2 | {"n": 2.0, "m1": 100.75, "m2": 0.125, "m3": 0.0, ...}
 2026-05-03 20:15:00+00 |         2 |                19.5 |                19.5 |                19.5 |                19.5 |       50 | {"n": 1.0, "m1": 80.0, "m2": 0.0, ...}
 2026-05-03 20:16:00+00 |         1 |                  23 |                  23 |                  23 |                  23 |       44 | {"n": 1.0, "m1": 105.0, "m2": 0.0, ...}
 2026-05-03 20:16:00+00 |         2 |                19.8 |                19.8 |                19.8 |                19.8 |       51 | {"n": 1.0, "m1": 82.0, "m2": 0.0, ...}
 2026-05-03 21:15:00+00 |         1 |                  25 |                  25 |                  25 |                  25 |       40 | {"n": 1.0, "m1": 110.0, "m2": 0.0, ...}
(5 rows)

humidity is summed per minute bucket. power_usage_stats stores the Welford moments as JSONB — mean (m1), variance accumulator (m2), etc. — ready to merge up to the hourly tier without raw data.

Refresh cascades to the 1-hour rollup automatically because Spiral knows the hierarchy!

Partial Refresh by Tenant Scope

In high-cardinality deployments you often need to refresh only one tenant’s data — e.g., after a backfill for sensor_id = 3 without disturbing others. spiral_refresh accepts an optional WHERE-style scope predicate:

-- Refresh only sensor 3's dirty buckets
SELECT spiral_refresh('sensor_data', 'sensor_id = 3');

Spiral intersects that predicate with the changelog, processes only the matching dirty buckets, and leaves every other tenant’s rollup untouched. The changelog entries for other tenants survive the partial refresh and are picked up on their own schedule.

Transparent Query Acceleration

Spiral intercepts queries to the base table and rewrites them to use the pre-aggregated rollups seamlessly — no query changes needed.

-- You write this:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz 
  AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;

-- Spiral rewrites it to something like:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature_ohlcv_h)
FROM sensor_data_1h
WHERE t >= '2026-05-03 19:00:00'::timestamptz
  AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;

The rewrite selects the coarsest rollup tier whose granularity fits the query’s date_trunc(...). A 4-hour window that matches the 1h tier reads ~4 rows instead of potentially thousands of raw ticks — without any application-level change.

Multi-dimension GROUP BY acceleration. When the query groups by both a tenant column and date_trunc(...), the planner handles both dimensions together — matching the best rollup tier for the granularity, scoped to that tenant.

Z-order index push-down. For tenant-scoped rollup sub-queries, Spiral auto-injects a BETWEEN predicate on the z-order index:

-- Spiral rewrites this internally:
WHERE spiral_zorder(spiral(t), ARRAY['sensor_id'])
      BETWEEN <lo> AND <hi>

Z-order is monotone in t for a fixed tenant hash, so the BETWEEN exactly covers that tenant’s time range without scanning other tenants’ rows. This turns full-rollup scans into tight index range scans.

Multi-Table Acceleration: Join Constraint Propagation

The planner hook doesn’t stop at single-table rewrites. When Spiral detects an equijoin on t between two tracked tables, it propagates the time constraint from the constrained side to the unconstrained side, accelerating both simultaneously.

-- Two independent time-series, joined by time
SELECT s.t, s.sensor_id, s.temperature_ohlcv_h,
       a.asset_id,       a.price_ohlcv_h
FROM sensor_data s
JOIN asset_ticks  a ON s.t = a.t
WHERE s.t >= '2026-05-03 19:00:00'::timestamptz
  AND s.t <  '2026-05-03 23:00:00'::timestamptz;

Without Spiral: asset_ticks has no time predicate — full scan.

With Spiral: the planner walks the JoinTree, detects s.t = a.t, propagates WHERE t >= ... AND t < ... to asset_ticks, then rewrites both sides to their respective rollup tiers:

-- Effective plan:
SELECT s.t, s.sensor_id, s.temperature_ohlcv_h,
       a.asset_id,       a.price_ohlcv_h
FROM   sensor_data_1h s          -- ← accelerated
JOIN   asset_ticks_1h  a ON s.t = a.t   -- ← also accelerated via propagation
WHERE  s.t >= '2026-05-03 19:00:00'::timestamptz
  AND  s.t <  '2026-05-03 23:00:00'::timestamptz;

Both tables independently benefit from dirty-aware slicing. A dirty bucket in sensor_data does not force a raw scan of asset_ticks — each side is sliced independently against its own changelog.

Handling Late-Arriving Data

Let’s insert an hour of randomized, late data for all tenants.

INSERT INTO sensor_data (t, sensor_id, temperature, humidity, power_usage)
SELECT 
    '2026-05-03 22:00:00'::timestamptz + (random() * 60 || ' minutes')::interval,
    id,
    20 + random() * 10,
    40 + random() * 20,
    90 + random() * 30
FROM generate_series(1, 2) AS id, generate_series(1, 60);

Spiral tracks exactly which time buckets and tenants are “dirty”.

SELECT base_view, t_start, t_end, scope_values 
FROM spiral.changelog ORDER BY t_start;

  base_view  |  t_start   |   t_end    |   scope_values
-------------+------------+------------+------------------
 sensor_data | 1777856400 | 1777856460 | {"sensor_id": 1}
 sensor_data | 1777856520 | 1777856580 | {"sensor_id": 2}
 sensor_data | 1777856760 | 1777856820 | {"sensor_id": 1}
 ...
(76 rows)

Each row is one 60-second bucket per tenant. Only the buckets that actually received new data are marked dirty — the rest stay clean and will continue serving from the rollup.

Smart Query Slicing in Action

Spiral’s planner intercepts the query, consults the changelog, and rewrites to a UNION ALL that:

routes clean hour-aligned buckets to sensor_data_1h (pre-aggregated, fast)
routes dirty sub-minute windows back to sensor_data (raw, accurate)

-- You write:
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz 
  AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;

-- Spiral actually runs (conceptual rewrite):
SELECT date_trunc('hour', t) AS hour, sensor_id,
       max(temperature_ohlcv_h) AS max_temperature
FROM sensor_data_1h                       -- ← pre-aggregated tier
WHERE t >= '2026-05-03 19:00:00'::timestamptz
  AND t < '2026-05-03 22:00:00'::timestamptz  -- 3 clean hours
  AND spiral_zorder(spiral(t), ARRAY['sensor_id'])
      BETWEEN 1777843200000000 AND 1777857600999999
GROUP BY 1, 2

UNION ALL

SELECT date_trunc('hour', t) AS hour, sensor_id,
       max(temperature) AS max_temperature
FROM sensor_data                          -- ← raw tier (dirty window)
WHERE t >= '2026-05-03 22:00:00'::timestamptz
  AND t < '2026-05-03 23:00:00'::timestamptz  -- 1 dirty hour
GROUP BY 1, 2;

The key insight: Spiral never re-aggregates clean data. The 3 clean hours each contribute exactly 1 row from sensor_data_1h. Only the dirty hour scans raw rows. As dirty buckets are healed, they graduate back to the rollup tier automatically.

Tier selection rules:

A bucket is eligible for a rollup tier if it is aligned to that tier’s frame and clean in the changelog.
The coarsest aligned tier wins (1h beats 1m, 1m beats raw).
Adjacent segments from the same source are merged into one range scan to minimize plan nodes.

You can force EXPLAIN to show the rewrite:

EXPLAIN (VERBOSE)
SELECT date_trunc('hour', t) AS hour, sensor_id, max(temperature)
FROM sensor_data
WHERE t >= '2026-05-03 19:00:00'::timestamptz 
  AND t < '2026-05-03 23:00:00'::timestamptz
GROUP BY 1, 2;

The plan will show a HashAggregate over an Append (the UNION ALL) with separate SeqScan or IndexScan nodes — one per segment. After spiral_refresh, all four hours are clean and the plan collapses to a single scan of sensor_data_1h.

Healing the Orbits & Handling Deletions

We heal the dirty buckets. It only processes the changes!

SELECT spiral_refresh('sensor_data');

The entire time range is now clean. The same query now generates a single-tier plan:

-- After refresh, Spiral rewrites to a single rollup scan:
SELECT date_trunc('hour', t) AS hour, sensor_id,
       max(temperature_ohlcv_h) AS max_temperature
FROM sensor_data_1h
WHERE t >= '2026-05-03 19:00:00'::timestamptz
  AND t < '2026-05-03 23:00:00'::timestamptz
  AND spiral_zorder(spiral(t), ARRAY['sensor_id'])
      BETWEEN 1777843200000000 AND 1777857600999999
GROUP BY 1, 2;
-- 4 rows read instead of thousands of raw ticks.

If we delete data, it isolates the dirty fallback to ONLY the affected tenant (sensor_id = 1) for that specific time bucket — other tenants’ rollups stay pristine.

DELETE FROM sensor_data 
WHERE sensor_id = 1 
  AND t >= '2026-05-03 20:15:00'::timestamptz 
  AND t < '2026-05-03 20:25:00'::timestamptz;

SELECT base_view, t_start, t_end, scope_values 
FROM spiral.changelog ORDER BY t_start;

  base_view  |  t_start   |   t_end    |   scope_values
-------------+------------+------------+------------------
 sensor_data | 1777850100 | 1777850160 | {"sensor_id": 1}
 sensor_data | 1777850160 | 1777850220 | {"sensor_id": 1}
(2 rows)

Only sensor_id = 1 is dirty. sensor_id = 2’s rollups are completely untouched — a targeted dirty mark, not a full-table invalidation.

You can inspect the lag at any time:

SELECT * FROM spiral.status;

  base_view  | tier_count | dirty_entries | dirty_scopes | dirty_seconds | oldest_dirty_ts        | newest_dirty_ts        | lag
-------------+------------+---------------+--------------+---------------+------------------------+------------------------+-------------------------
 sensor_data |          2 |             2 |            1 |           120 | 2026-05-03 20:15:00+00 | 2026-05-03 20:17:00+00 | 21 days 01:23:37
(1 row)

lag tells you how stale your oldest dirty bucket is.

In a multi-tenant deployment, spiral.status aggregates across all tenants. For per-tenant visibility, spiral.scope_status breaks it down:

SELECT * FROM spiral.scope_status ORDER BY lag DESC;

  base_view  |    scope_values    | dirty_entries | dirty_seconds | oldest_dirty_ts        | lag
-------------+--------------------+---------------+---------------+------------------------+--------------------
 sensor_data | {"sensor_id": "1"} |            12 |           720 | 2026-05-03 20:15:00+00 | 21 days 01:23:37
 sensor_data | {"sensor_id": "2"} |             3 |           180 | 2026-05-03 22:10:00+00 | 20 days 23:05:12
(2 rows)

Sensor 1 has more dirty buckets than sensor 2 — a targeted spiral_refresh(‘sensor_data’, ‘sensor_id = 1’) heals only that tenant’s backlog without touching sensor 2.

For a single-number health check, spiral_lag() returns the lag as an interval:

SELECT spiral_lag(‘sensor_data’);
-- → 21 days 01:23:37   (NULL when fully current)

Pipe this into any monitoring system. NULL means clean; a growing interval means the worker is falling behind.

Autonomous Background Worker

You don’t need to call spiral_refresh manually. When spiral.frames is set, Spiral registers a background worker that wakes every second, claims dirty (base_view, scope) pairs from the changelog, and heals them — automatically.

-- postgresql.conf (or SET for session)
spiral.worker_enabled   = on    -- default: on
spiral.max_workers      = 2     -- parallel workers per DB
spiral.worker_batch_size = 10   -- scopes refreshed per 1s tick

Workers use advisory locks to divide scopes without conflicts: worker A refreshes sensor_id = 1, worker B refreshes sensor_id = 2 — no coordination overhead, no duplicate work.

The planner hook is similarly tunable:

spiral.enable_planner_hook   = on    -- disable to compare plans
spiral.planner_max_segments  = 50    -- fall back to RAW if UNION ALL would exceed N parts

When planner_max_segments is exceeded (highly fragmented dirty ranges), Spiral falls back to a full raw scan and logs a notice — trading query rewrite overhead for a simpler plan.

Mathematical Engine: Welford’s Algorithm

The Problem with Standard `stddev()`

PostgreSQL’s built-in stddev() cannot be rolled up. Given two pre-aggregated hourly stddev values you cannot compute the combined stddev without the original rows. This breaks the rollup chain.

Spiral needs a mergeable statistical state — one where rollup(rollup(raw → 1m) → 1h) gives the same answer as rollup(raw → 1h).

The Solution: Store Moments, Not Final Values

Welford’s online algorithm accumulates four central moments (n, m1, m2, m3, m4) rather than a computed stddev. From those moments you can derive mean, variance, stddev, skewness, and kurtosis at any time. Crucially, two StatsState structs can be merged exactly using Chan et al.’s parallel algorithm — no raw data needed.

\[\bar{x}_n = \bar{x}_{n-1} + \frac{x_n - \bar{x}_{n-1}}{n}\]

\[M_{2,n} = M_{2,n-1} + (x_n - \bar{x}_{n-1})(x_n - \bar{x}_n)\]

The Rust Implementation (`src/stats.rs`)

// src/stats.rs
#[derive(Serialize, Deserialize, Default, Clone, Copy)]
pub struct StatsState {
    pub n:  f64,   // count
    pub m1: f64,   // mean
    pub m2: f64,   // sum of squared deviations  (→ variance)
    pub m3: f64,   // sum of cubed deviations    (→ skewness)
    pub m4: f64,   // sum of quartic deviations  (→ kurtosis)
}

impl StatsState {
    pub fn add(&mut self, x: f64) {
        let n1 = self.n;
        self.n += 1.0;
        let delta   = x - self.m1;
        let delta_n = delta / self.n;
        let term1   = delta * delta_n * n1;
        self.m1 += delta_n;
        self.m2 += term1;
        // m3 and m4 updated similarly — O(1) per value, numerically stable
    }

    // Chan’s parallel algorithm: merge two independent StatsState structs
    // Used when rolling up 1m → 1h → 1d without re-scanning raw rows.
    pub fn merge(&mut self, other: &Self) {
        let combined_n = self.n + other.n;
        let delta  = other.m1 - self.m1;
        let delta2 = delta * delta;
        let m1 = (self.n * self.m1 + other.n * other.m1) / combined_n;
        let m2 = self.m2 + other.m2 + delta2 * self.n * other.n / combined_n;
        // m3, m4 follow the same pattern with higher-order delta terms
        self.n = combined_n; self.m1 = m1; self.m2 = m2; /* ... */
    }

    pub fn mean(&self)     -> f64 { self.m1 }
    pub fn variance(&self) -> f64 { self.m2 / (self.n - 1.0) }
    pub fn stddev(&self)   -> f64 { self.variance().sqrt() }
    pub fn skewness(&self) -> f64 { self.n.sqrt() * self.m3 / self.m2.powf(1.5) }
    pub fn kurtosis(&self) -> f64 { self.n * self.m4 / (self.m2 * self.m2) - 3.0 }
}

How `pgrx` Hooks Connect Rust to PostgreSQL Aggregates

pgrx lets Rust functions register directly as PostgreSQL functions via the #[pg_extern] attribute. Spiral uses two functions to form a full PostgreSQL aggregate:

// Accumulation step — called once per raw row during spiral_refresh
#[pg_extern(immutable, parallel_safe)]
pub fn spiral_stats_accum(state: Option<pgrx::JsonB>, val: f64) -> pgrx::JsonB {
    let mut s = state
        .map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
        .unwrap_or_default();
    s.add(val);
    pgrx::JsonB(serde_json::to_value(s).unwrap())
}

// Combine step — called when merging two rollup tiers (1m → 1h)
// `parallel_safe` tells PostgreSQL this can run across parallel workers
#[pg_extern(immutable, parallel_safe)]
pub fn spiral_stats_combine(
    state1: Option<pgrx::JsonB>,
    state2: Option<pgrx::JsonB>,
) -> pgrx::JsonB {
    let mut s1 = state1.map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
        .unwrap_or_default();
    let s2     = state2.map(|j| serde_json::from_value::<StatsState>(j.0).unwrap())
        .unwrap_or_default();
    s1.merge(&s2);
    pgrx::JsonB(serde_json::to_value(s1).unwrap())
}

Why this fits the rollup architecture perfectly:

Property	Why it matters
Online — processes one value at a time	Matches IVM: only new changelog rows need processing, not full re-scan
Mergeable — two states combine exactly	Enables `1m → 1h → 1d` rollup chains without raw data
Numerically stable — Welford avoids catastrophic cancellation	Large sensor datasets with tight value ranges stay accurate
`parallel_safe` — PostgreSQL can split across workers	Rollup refreshes parallelise automatically at no cost
JSONB state — opaque to PostgreSQL	Schema-free; adding m5 or percentiles needs no DDL change

The -- Spiral: stats magic comment wires this up: when building a rollup view Spiral emits spiral_stats_accum(col) for the 1m layer and spiral_stats_combine(col_stats) for every higher tier.

Approximate Quantiles: Sketch and T-Digest

Welford gives exact moments (mean, stddev, skewness). But percentile_cont(0.95) cannot be rolled up — there is no exact mergeable form. Spiral provides two mergeable approximate quantile sketches instead.

Magic comment	Aggregate	Merge fn	Accuracy
`-- Spiral: sketch`	`spiral_sketch(col)`	`spiral_sketch_merge(col_sketch)`	DDSketch, relative error ≤ 1%
`-- Spiral: tdigest`	`spiral_tdigest(col)`	`spiral_tdigest_merge(col_tdigest)`	t-digest, better tail accuracy

Like stats, both store a JSONB blob at the 1m tier and merge it exactly as it rolls up to 1h, 1d — no raw data needed.

CREATE TABLE api_requests (
    t          timestamptz NOT NULL,
    service_id int         NOT NULL,
    latency_ms double precision -- Spiral: tdigest
) WITH (
    spiral.frames = '1m,1h,1d',
    spiral.tenant = 'service_id'
);

After refresh, the rollup stores latency_ms_tdigest at every tier. Query approximate percentiles at any granularity:

-- p50, p95, p99 from the hourly rollup — no raw rows needed
SELECT t, service_id,
       spiral_tdigest_quantile(latency_ms_tdigest, 0.50) AS p50,
       spiral_tdigest_quantile(latency_ms_tdigest, 0.95) AS p95,
       spiral_tdigest_quantile(latency_ms_tdigest, 0.99) AS p99
FROM api_requests_1h
WHERE t >= now() - interval '24h'
ORDER BY t, service_id;

This is the key advantage over standard PostgreSQL: a 1h bucket carries a compact sketch (~1 KB) that can answer any quantile query, and two hourly sketches merge into a daily sketch in microseconds. No raw rows ever need to be re-read.

Z-Order Indexing for Existing Tables

Spiral’s WITH (spiral.frames=...) wires everything automatically for new tables. For an existing table without Spiral TAM, cluster_table() creates the z-order index in one call:

-- Add z-order index to any table:
SELECT cluster_table('sensor_data', 't', ARRAY['sensor_id']);
-- Creates: idx_z_sensor_data ON sensor_data
--   USING btree (spiral_zorder(spiral(t), ARRAY['sensor_id']))

The planner hook then uses this index for BETWEEN pushdown on any query that filters by both t and sensor_id. No data migration, no DDL changes to existing schemas.

The spiral(t) function used inside the index is an identity for bigint and extracts the microsecond epoch for timestamptz. It exists so the index expression is stable regardless of which type t carries:

SELECT spiral('2026-05-03 20:15:00'::timestamptz);
-- → 1746303300000000  (microseconds since Unix epoch)

SELECT to_timestamptz(1746303300000000);
-- → 2026-05-03 20:15:00+00

Round-tripping through spiral() / to_timestamptz() lets you embed raw epoch arithmetic in SQL when needed — for example, to manually verify a z-order range or debug a planner rewrite.

Experimental Results: Prototype Benchmarks

In my local environment, with a 10 million row dataset, I saw results that justified continuing this research.

Phase	Ingest Rate	Duration
Bulk Load	1,940,482 rows/s	0.51s (1M sample)
Backfill	302,413 rows/s	3.30s (1M sample)

Building PostgreSQL Extensions in Rust: What I Learned

Spiral is built with pgrx — a framework that lets you write PostgreSQL extensions entirely in Rust. If you have ever wanted to extend PostgreSQL but found C intimidating, pgrx is worth serious attention.

The API Surface Spiral Uses

Every major PostgreSQL extension mechanism is available through pgrx:

Mechanism	What it does	How Spiral uses it
`#[pg_extern]`	Export a Rust fn as a SQL function	`spiral_refresh`, `spiral_zorder`, `spiral_lag`
`extension_sql!`	Embed raw SQL in the extension	`spiral.changelog`, `spiral.status`, operators
`#[pg_aggregate]` / `CREATE AGGREGATE`	Custom aggregates	`spiral_stats_accum/combine`, sketch/tdigest
`TableAmRoutine` (TAM)	Custom storage engine	Binary-packed page layout, O(1) seeks
`planner_hook`	Intercept & rewrite query plans	UNION ALL rewrites, join propagation, z-order pushdown
`BackgroundWorker`	Autonomous server processes	1-second changelog poller + scope-affinity refresh
`GucRegistry`	Custom `postgresql.conf` settings	`spiral.worker_enabled`, `spiral.max_workers`, etc.
`Spi`	Execute SQL from Rust	Changelog queries, metadata lookups, rollup SQL emission

What pgrx Makes Easy

Zero-cost type mapping. Rust f64 maps to PostgreSQL float8, i64 to bigint, Vec<Option<String>> to text[]. No manual Datum casting needed.

Memory safety in unsafe territory. PostgreSQL’s C API is deeply unsafe — raw pointers to pg_sys::Query, pg_sys::PlannedStmt, page buffers. pgrx wraps enough to make the common paths safe, while letting you drop to unsafe where needed (TAM scan callbacks, page manipulation).

Aggregate functions with combine support. The parallel_safe attribute on #[pg_extern] tells PostgreSQL the function can run across parallel workers — enabling automatic parallelism for rollup refreshes at no extra code cost.

What is Still Hard

The planner hook is raw C. planner_hook_type takes an unsafe extern "C-unwind" fn. Traversing pg_sys::Query, pg_sys::RangeTblEntry, and the JoinTree requires pointer arithmetic and intimate knowledge of PostgreSQL internals. pgrx does not abstract this — you read the PostgreSQL source to understand the node shapes.

TAM callbacks must not panic. Any Rust panic inside a TAM callback (scan_getnextslot, tuple_insert) crashes the backend. You must catch all errors before they unwind past the C ABI boundary.

Testing requires a running PostgreSQL. #[pg_test] spins up a real Postgres instance per test run. The setup is automatic with pgrx but CI times are longer than unit tests — integration is the only meaningful test boundary.

Starting Point for Your Own Extension

If you want to build a PostgreSQL extension in Rust:

cargo install cargo-pgrx
cargo pgrx new my_extension
cargo pgrx run pg17   # drops into psql with your extension loaded

The pgrx examples cover custom types, aggregates, operators, and background workers. Spiral’s source is also heavily commented — the TAM implementation in src/tam.rs and the planner hook in src/hooks.rs are good starting points for those two harder topics.

The PostgreSQL extension ecosystem needs more Rust. The barrier is lower than it looks.

Open for Collaboration

Building Spiral has been a rewarding learning experience. This is a work in progress. If you are a PostgreSQL internals expert, a Rustacean, or someone who loves high-performance storage, I invite you to contribute.

Spiral is open source. Check it out on GitHub and let’s explore the future of storage together.

Spiral: An Experiment in Geometry-Aware Storage for PostgreSQL