A few days ago, I was testing the lttb function from the timescaledb_toolkit extension to downsample data into fewer points.

Using the weather dataset from Timescale, we got 20k points per device, but I need more data from the same device_id to downsample. Utilizing random or generate_series doesn’t bring a good quality of data for this case.

This blog post aims to show how to insert massive data by reusing the values from a dataset and shifting the time column to another period.

Let’s use the conditions table as an example:

playground=# \d conditions
                         Table "public.conditions"
┌─────────────┬──────────────────────────┬───────────┬──────────┬─────────┐
   Column               Type            Collation  Nullable  Default 
├─────────────┼──────────────────────────┼───────────┼──────────┼─────────┤
 time         timestamp with time zone             not null          
 device_id    text                                                   
 temperature  numeric                                                
 humidity     numeric                                                
└─────────────┴──────────────────────────┴───────────┴──────────┴─────────┘

We’re going to just duplicate the data for a target device_id. So, to start, you just need to know the size of the interval between your data.

SELECT MAX(time) - MIN(time)
FROM conditions
WHERE device_id = 'weather-pro-000000'

Now, you can reuse the interval and specify the order of the columns correctly to reinsert the data shifting the dataset backward or forward. In this case, I’m prepending data backward. This means inserting more data from the past instead of the future.

INSERT INTO conditions
  SELECT time - INTERVAL '108 days' as time,
    device_id, temperature, humidity
  FROM conditions
  WHERE device_id = 'weather-pro-000000';

The trick is the time - INTERVAL '108 days', which will reset the shift time from the dataset, and the rest of the data will be reused. To remove the hardcoded 108 days from the example, we’ll need to create a materialized CTE.

WITH previous AS materialized
(SELECT MAX(time) - MIN(time) AS period
FROM conditions
WHERE device_id = 'weather-pro-000001' )
TABLE previous;
┌──────────────────┐
      period      
├──────────────────┤
 27 days 18:38:00 
└──────────────────┘
(1 row)

Now, enhancing the example to preview all the data joining the conditions table:

WITH previous AS materialized (
  SELECT device_id, MAX(time) - MIN(time) as period
  FROM conditions
  WHERE device_id = 'weather-pro-000001'
  GROUP BY 1
)
SELECT cond.time + previous.period,
  previous.device_id,
  cond.temperature,
  cond.humidity
FROM previous
LEFT JOIN LATERAL
  (SELECT * from conditions)
AS cond ON cond.device_id = previous.device_id;

Now, it’s more about prepending the previous statement with the INSERT INTO conditions.

INSERT INTO conditions
WITH previous AS materialized (
  SELECT device_id, MAX(time) - MIN(time) as period
  FROM conditions
  WHERE device_id = 'weather-pro-000001'
  GROUP BY 1
)
SELECT cond.time + previous.period,
  previous.device_id,
  cond.temperature,
  cond.humidity
FROM previous
LEFT JOIN LATERAL
  (SELECT * from conditions)
AS cond ON cond.device_id = previous.device_id;

Checking the performance with \timing in the psql can give you some idea of how fast it is:

\timing
INSERT 0 40000 -- Time: 326.921 ms
INSERT 0 80000 -- Time: 582.390 ms
INSERT 0 160000 -- Time: 1086.917 ms (00:01.087)

160k rows per second!

This post was inspired by a pairing session with David K. He is an SQL expert and came up with this fascinating idea to just shift the time and reinsert the same data.

If you like Postgres and SQL, David is also creating a very instructive video series’ named the Postgresql foundations on Youtube.


Share → Twitter Facebook Linkedin


Hello there, my name is Jônatas Davi Paganini and this is my personal blog.
I'm developer advocate at Timescale and I also have a few open source projects on github.

Check my talks or connect with me via linkedin / twitter / github / instagram / facebook / strava / meetup.