Time Series Benchmark Timescaledb Raspberry Pi

I started supporting tsbs project, and this weekend I decided to try to run it on my Raspberry Pi version 3.

Here I’ll share a bit of my saga to make it work and all the steps I followed to benchmark Postgresql with TimescaleDB.

Setup environment

If you want to try this, the first thing you need to do is download an Ubuntu image to load in your Pi. I have an old one here, so, I updated it recently with the following command:

sudo apt dist-upgrade

It will take a while and maybe go into some crazy errors. To me, I have to force it to go.

Disable GUI interface

We’re just going to access the machine via ssh, so it would be better to have less process running as possible. As ubuntu mate brings interface enabled by default, let’s disable it:

systemctl set-default multi-user.target --force
systemctl disable lightdm.service --force
systemctl disable graphical.target --force
systemctl disable plymouth.service --force

Installing latest go version

tsbs is written in go, and we’re going to build it inside the Raspberry Pi. You can download it with the following command:

cd /tmp
wget https://dl.google.com/go/go1.14.4.linux-armv6l.tar.gz

Now, the next step is to extract it into /usr/local:

sudo tar -C /usr/local -xzf go1.14.4.linux-armv6l.tar.gz

Edit your ~/.profile to configure PATH and GOPATH:

PATH=$PATH:/usr/local/go/bin
GOPATH=$HOME/go

Now, we still need to create the GOPATH folder:

mkdir $HOME/go

and then we can source the .profile to reload the configuration:

source ~/.profile

Now, double-checking the version:

jonatas@rpi-3:~$ go version
go version go1.14.4 Linux/arm

Running timescaledb via docker

Just make sure you don’t have Postgres running:

sudo service PostgreSQL stop

Pulling timescaledb official image

Now, let’s run the image. It will automatically pull if you don’t have it locally:

sudo docker run --name timescaledb -p 5432:5433 -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg12

The first time you run it, the timescaledb-tune will break it, and you’ll need to join the image via ssh and edit the configuration.

The tool outputs the following configuration:

docker start timescaledb
docker exec -i -t timescaledb /bin/bash

You can learn more about how to edit the postgresql.conf inside a timescale container here.

Here are the suggestions from the tool:

shared_buffers = 223094kB
effective_cache_size = 669282kB
maintenance_work_mem = 111547kB
work_mem = 2788kB
timescaledb.max_background_workers = 8
max_worker_processes = 15
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
wal_buffers = 6692kB
min_wal_size = 512MB
default_statistics_target = 500
random_page_cost = 1.1
checkpoint_completion_target = 0.9
max_connections = 20
max_locks_per_transaction = 64
autovacuum_max_workers = 10
autovacuum_naptime = 10
effective_io_concurrency = 200
timescaledb.last_tuned = '2021-05-22T23:23:57Z'
timescaledb.last_tuned_version = '0.11.0'

While testing, maybe you’ll end up running it multiple times, but you want to reuse the same name, you can remove the previous image. Here is a snippet for it:

sudo docker rm $(sudo docker ps -aq --filter name=timescaledb)

While it pulls the image, we can get more time to install tsbs and take the time to compile it.

Installing tsbs

Let’s get the tsbs project:

go get github.com/timescale/tsbs

Let’s join the folder:

cd $GOPATH/src/github.com/timescale/tsbs

Now we can run make all to compile all the tool we need to benchmark time-series databases.

make all

It will take a while: 10 to 15 minutes in this small hardware, enjoy your favorite drink ☕️

Configuring environment variables

Next step is get familiar with the tools and their purpose:

tsbs_load will let you benchmark the data load into a database (several in the list).
tsbs_run_queries_timescaledb allows you to execute several types of queries widespread in IoT and DevOps ecosystems. more here.

Running tsbs

First, let’s generate some config to run, so it’s easy to change the variables instead of keeping a long command line.

tsbs_load config

It will generate some config, let me share the one I used here:

data-source:
  # data source type [SIMULATOR|FILE]
  type: SIMULATOR
  # generate data on the fly
  simulator:
    # each time the simulator advances in time it skips this amount of time
    log-interval: 10s
    # maximum number of points to simulate (limit)
    max-data-points: 1000000
    # number of hosts to simulate (each host has a different tag-set/label-set
    scale: 40
    # set seed to some number to have reproducible data be generated
    seed: 135
    # start time of simulation
    timestamp-start: "2021-01-01T10:00:00Z"
    # end time of simulation
    timestamp-end: "2021-01-04T08:00:00Z"
    # use case to simulate
    use-case: cpu-only
loader:
  db-specific:
    admin-db-name: postgres
    # set chunk time depending on server size
    chunk-time: 4h0m0s
    create-metrics-table: true
    field-index: VALUE-TIME
    field-index-count: 0
    force-text-format: false
    host: 0.0.0.0
    user: postgres
    pass: password
    port: 5432
    in-table-partition-tag: false
    log-batches:false
    partition-index: true
    partitions: 1
    postgres: sslmode=prefer
    time-index: true
    time-partition-index: false
    use-hypertable: true
    use-distributed-hypertable: false
    use-jsonb-tags: false
  runner:
    # the simulated data will be sent in batches of 'batch-size' points
    # to each worker
    batch-size: 10000
    # don't worry about this until you need to simulate data with scale > 1000
    channel-capacity: "0"
    db-name: benchmark
    do-abort-on-exist: false
    do-create-db: true
    # set this to false if you want to see the speed of data generation
    do-load: true
    # don't worry about this until you need to simulate data with scale > 1000
    flow-control: false
    hash-workers: true
    limit: 1000000
    reporting-period: 10s
    seed: 135
    workers: 2 

Running:

jonatas@rpi-3:~/go/src/github.com/timescale/tsbs$tsbs_load load timescaledb --config config.yaml Using config file: config.yaml
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
%!(EXTRA uint64=10000)panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x125b4]

goroutine 56 [running]:
runtime/internal/atomic.goXadd64(0x2908824, 0x186a0, 0x0, 0x2, 0x2)
	/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:103 +0x1c
github.com/timescale/tsbs/load.(*CommonBenchmarkRunner).work(0x29087e0, 0xc555a0, 0x29ac7c0, 0x2f1a640, 0x29b2928, 0x1)
	/home/jonatas/go/src/github.com/timescale/tsbs/load/loader.go:248 +0x1ec
created by github.com/timescale/tsbs/load.(*CommonBenchmarkRunner).RunBenchmark
	/home/jonatas/go/src/github.com/timescale/tsbs/load/loader.go:160 +0xd8

Oh, we got an error :sad panda:

Looks like the error is around here.

Let’s check if it makes something in the database:

psql benchmark -h 0.0.0.0 -U postgres
Password for user Postgres:
psql (10.16 (Ubuntu 10.16-0ubuntu0.18.04.1), server 12.6)
WARNING: psql major version 10, server major version 12.
         Some psql features might not work.
Type "help" for help.

Checking the tables with \dt:

benchmark=# \dt
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | CPU  | table | Postgres
 public | tags | table | Postgres
(2 rows)

Great! it created the tables :)

Now, let’s see if it was able to insert something:

benchmark=# select count(1) from cpu;
 count
-------
 20000
(1 row)

benchmark=# select count(1)  from tags ;
 count
-------
    40
(1 row)

40 tags as set in the configuration.

Exactly the numbers we have as limits in our system!

Let’s try to understand the error putting one infamous printing line to get what we have in the context:

--- a/load/loader.go
+++ b/load/loader.go
@@ -245,6 +245,7 @@ func (l *CommonBenchmarkRunner) work(b targets.Benchmark, wg *sync.WaitGroup, c
        for batch := range c.toWorker {
                startedWorkAt := time.Now()
                metricCnt, rowCnt := proc.ProcessBatch(batch, l.DoLoad)
+               printFn("loaded metricCnt %d with rowCnt %d\n",metricCnt, rowCnt)

Now, let’s compile tsbs loaders and rerun it. To compile only the loaders you can use a make task:

make loaders

But this task will compile all loaders and on a Raspberry, all resources are limited. Let’s cherry-pick only the builds we’re using:

make tsbs_load tsbs_load_timescaledb

You should see several lines in the output like these:

GO111MODULE=on go get ./cmd/tsbs_load
GO111MODULE=ontsbs_load ./cmd/tsbs_load
GO111MODULE=on go install ./cmd/tsbs_load
GO111MODULE=on go get ./cmd/tsbs_load_timescaledb
GO111MODULE=ontsbs_load_timescaledb ./cmd/tsbs_load_timescaledb
GO111MODULE=on go install ./cmd/tsbs_load_timescaledb

Running again:

bin/tsbs_load load timescaledb --config config.yaml Using config file: config.yaml
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
loaded metricCnt 100000 with rowCnt 10000
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x125b4]

Well, looks like the rows are being inserted, but some other pointer reference is nil.

Wow! Inspecting the atomic.AddUInt64 is responsible for such bug only on arm. Funny, no? Let’s go deep and see how it works, because today I can learn about go deeper.

Here is the gift.

// BUG(RSC): On 386, the 64-bit functions use instructions unavailable before the Pentium MMX.
//
// On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core.
//
// On ARM, 386, and 32-bit MIPS, it is the caller's responsibility
// to arrange for 64-bit alignment of 64-bit words accessed atomically.
// The first word in a variable or in an allocated struct, array, or slice can
// be relied upon to be 64-bit aligned.

Now, I see that I need to find another way to implement this small counter. Funny that I just want to sum two integers 🙀

Then, navigating in the issue details, I discovered this article about Type Alignment Guarantees, the memory layout, which makes me learn many low-level things.

Wow! I’m not going that far if I cannot change this counter to make it run.

Let’s go! If you checked the previous post, you’ll see this Counter with a bit sized alignment that makes it possible to use AddUInt64 without failures:

Here are the most important changes to make tsbs run with ARM 32bits environment:

// Some hacks to run on ARM 32 bits platforms
// see: https://go101.org/article/memory-layout.html
type Counter struct {
       x [15]byte // instead of "x uint64"
}

func (c *Counter) xAddr() *uint64 {
       // The return must be 8-byte aligned.
       return (*uint64)(unsafe.Pointer(
               (uintptr(unsafe.Pointer(&c.x)) + 7) / 8 * 8))
}

func (c *Counter) Add(delta uint64) {
       p := c.xAddr()
       atomic.AddUint64(p, delta)
}

func (c *Counter) Value() uint64 {
       return atomic.LoadUint64(c.xAddr())
}

Now we need to change CommonBenchmarkRunner to use our Counter instead of uint64 for the metricCnt and rowCnt.

 // CommonBenchmarkRunner is responsible for initializing and storing common
 // flags across all database systems and ultimately running a supplied Benchmark
 type CommonBenchmarkRunner struct {
        BenchmarkRunnerConfig
-       metricCnt      uint64
-       rowCnt         uint64
+       metricCnt      Counter
+       rowCnt         Counter
        initialRand    *rand.Rand
        sleepRegulator insertstrategy.SleepRegulator
 }

Later, in the call, we use the .Add method that do the bit alignment between the 32 and 64 bits parity:

 @@ -245,8 +267,8 @@ func (l *CommonBenchmarkRunner) work(b targets.Benchmark, wg *sync.WaitGroup, c
        for batch := range c.toWorker {
                startedWorkAt := time.Now()
                metricCnt, rowCnt := proc.ProcessBatch(batch, l.DoLoad)
-               atomic.AddUint64(&l.metricCnt, metricCnt)
-               atomic.AddUint64(&l.rowCnt, rowCnt)
+               l.metricCnt.Add(metricCnt)
+               l.rowCnt.Add(rowCnt)
                c.sendToScanner()
                l.timeToSleep(workerNum, startedWorkAt)

And, then we just need to use .Value() to rescue the uint64 value again.

@@ -268,12 +290,12 @@ func (l *CommonBenchmarkRunner) timeToSleep(workerNum uint, startedWorkAt time.T

 // summary prints the summary of statistics from loading
 func (l *CommonBenchmarkRunner) summary(took time.Duration) {
-       metricRate := float64(l.metricCnt) / took.Seconds()
+       metricRate := float64(l.metricCnt.Value()) / took.Seconds()
        printFn("\nSummary:\n")
-       printFn("loaded %d metrics in %0.3fsec with %d workers (mean rate %0.2f metrics/sec)\n", l.metricCnt, took.Seconds(), l.Workers, metricRate)
-       if l.rowCnt > 0 {
-               rowRate := float64(l.rowCnt) / float64(took.Seconds())
-               printFn("loaded %d rows in %0.3fsec with %d workers (mean rate %0.2f rows/sec)\n", l.rowCnt, took.Seconds(), l.Workers, rowRate)
+       printFn("loaded %d metrics in %0.3fsec with %d workers (mean rate %0.2f metrics/sec)\n", l.metricCnt.Value(), took.Seconds(), l.Workers, metricRate)
+       if l.rowCnt.Value() > 0 {
+               rowRate := float64(l.rowCnt.Value()) / float64(took.Seconds())
+               printFn("loaded %d rows in %0.3fsec with %d workers (mean rate %0.2f rows/sec)\n", l.rowCnt.Value(), took.Seconds(), l.Workers, rowRate)
        }
 }

Now, after compiling it again, it’s time to see it in action.

tsbs_load results

First thing to remind here is that I’m going to do a few tests scenarios here:

Run tsbs and PostgreSQL inside the same Raspberry PI
Run tsbs from my Macbook and PostgreSQL on Raspberry PI using my local network

Running TSBS directly inside the Raspberry PI

In the Raspberry PI console, go to tsbs folder and run:

./bin/tsbs_load load timescaledb --config config.yaml

Using config file: config.yaml
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1621732292,119996.91,1.200000E+06,119996.91,11999.69,1.200000E+05,11999.69
1621732302,120001.86,2.400000E+06,119999.39,12000.19,2.400000E+05,11999.94
1621732312,139997.19,3.800000E+06,126665.39,13999.72,3.800000E+05,12666.54
1621732322,139998.06,5.200000E+06,129998.57,13999.81,5.200000E+05,12999.86
1621732332,129821.72,6.500000E+06,129963.16,12982.17,6.500000E+05,12996.32
1621732342,120168.22,7.700000E+06,128332.96,12016.82,7.700000E+05,12833.30
1621732352,119854.13,8.900000E+06,127120.44,11985.41,8.900000E+05,12712.04

Summary:
loaded 9999990 metrics in 79.997sec with 2 workers (mean rate 125004.61 metrics/sec)
loaded 999999 rows in 79.997sec with 2 workers (mean rate 12500.46 rows/sec)

Not bad! 1M rows in 80 seconds!

Running tsbs from an external machine

Let’s try to move to outside of the Raspberry PI just to not have the overload of tsbs_load and configure it to insert in my Raspberry PI from my local network.

Sharing a bit of my setup, I’m running it from my Macbook that is connected via wi-fi to a modem and the Raspberry PI is connected via cable in the modem.

scp -r jonatas@rpi-cable:~/go/src/github.com/timescale/tsbs/config.yaml ./

Now I’ll change the host from 0.0.0.0 to rpi-cable that is mapping the actual IP via /etc/hosts.

Also change the config.yaml file host to the rpi-cable name.

  host: rpi-cable

Running:

./bin/tsbs_load load timescaledb --config /tmp/config.yaml
Using config file: /tmp/config.yaml
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1621733399,129933.24,1.300000E+06,129933.24,12993.32,1.300000E+05,12993.32
1621733409,230105.42,3.600000E+06,179994.99,23010.54,3.600000E+05,17999.50
1621733419,210010.39,5.700000E+06,189999.61,21001.04,5.700000E+05,18999.96
1621733429,209999.13,7.800000E+06,194999.50,20999.91,7.800000E+05,19499.95

Summary:
loaded 9999990 metrics in 48.649sec with 4 workers (mean rate 205553.49 metrics/sec)
loaded 999999 rows in 48.649sec with 4 workers (mean rate 20555.35 rows/sec)

Much better! 1M rows in 48 seconds!

When we do benchmarks at Timescale, we generally setup two machines in the cloud to guarantee we’ll not have interference of any process and have only the main Postgres process running. As I’m just having fun in the weekend, I was curious about understanding the load of tsbs and the interference in a small hardware.

In this case the interference is pretty clear, getting 60% more performance after removing the competition between tsbs and the PostgreSQL working on loading the data.

tsbs_queries results

Trying to run tsbs queries from the Raspberry will force me to change all the code to work with the new counter and I see it will not be a good idea, so let’s run only from the outside, which seems closer to a real server scenario.

I love to use the full_minicycle_timescaledb script that is the shortest path to have a full scenario of loading + using several queries over the DB.

Here is the command to run:

env HOST=rpi-cable PASSWORD=password ./full_cycle_minitest_timescaledb.sh

I hacked the first line of the bash script to be bash -x to expose all commands used in the script. So, first, it generates a lot of data:

+ mkdir -p /tmp/bulk_data
+ tsbs_generate_data --format timescaledb --use-case cpu-only --scale 4 --seed 123 --file /tmp/bulk_data/timescaledb_data
+ tsbs_generate_queries --queries=1000 --format timescaledb --use-case cpu-only --scale 4 --seed 123 --query-type lastpoint --file /tmp/bulk_data/timescaledb_query_lastpoint
TimescaleDB last row per host: 1000 points
+ tsbs_generate_queries --queries=1000 --format timescaledb --use-case cpu-only --scale 4 --seed 123 --query-type cpu-max-all-1 --file /tmp/bulk_data/timescaledb_query_cpu-max-all-1
TimescaleDB max of all CPU metrics, random    1 hosts, random 8h0m0s by 1h: 1000 points
+ tsbs_generate_queries --queries=1000 --format timescaledb --use-case cpu-only --scale 4 --seed 123 --query-type high-cpu-1 --file /tmp/bulk_data/timescaledb_query_high-cpu-1
TimescaleDB CPU over threshold, 1 host(s): 1000 points

The next step loads a short amount of metrics to test the inserts as we did before with the tsbs_load command, but using the YAML config. It also accepts command line parameters:

+ tsbs_load_timescaledb --pass=password '--postgres=sslmode=disable port=5432' --db-name=benchmark --host=rpi-cable --user=postgres --workers=1 --file=/tmp/bulk_data/timescaledb_data
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s

Summary:
loaded 345600 metrics in 5.844sec with 1 workers (mean rate 59134.26 metrics/sec)
loaded 34560 rows in 5.844sec with 1 workers (mean rate 5913.43 rows/sec)

And then it starts loading the queries part with previously generated queries:

+ tsbs_run_queries_timescaledb --max-rps=0 --hdr-latencies=0rps_timescaledb_query_lastpoint.hdr --pass=password '--postgres=sslmode=disable port=5432' --db-name=benchmark --hosts=rpi-cable --user=postgres --workers=1 --max-queries=1000 --file=/tmp/bulk_data/timescaledb_query_lastpoint
After 100 queries with 1 workers:
Interval query rate: 62.68 queries/sec	Overall query rate: 62.68 queries/sec
TimescaleDB last row per host:
min:     5.10ms, med:     6.51ms, mean:    15.93ms, max:  209.67ms, stddev:    24.06ms, sum:   1.6sec, count: 100
all queries                  :
min:     5.10ms, med:     6.51ms, mean:    15.93ms, max:  209.67ms, stddev:    24.06ms, sum:   1.6sec, count: 100
...
Run complete after 1000 queries with 1 workers (Overall query rate 61.99 queries/sec):
TimescaleDB last row per host:
min:     4.42ms, med:     6.75ms, mean:    16.05ms, max:  209.67ms, stddev:    17.46ms, sum:  16.1sec, count: 1000
all queries                  :
min:     4.42ms, med:     6.75ms, mean:    16.05ms, max:  209.67ms, stddev:    17.46ms, sum:  16.1sec, count: 1000
Saving High Dynamic Range (HDR) Histogram of Response Latencies to 0rps_timescaledb_query_lastpoint.hdr
wall clock time: 16.185718sec
+tsbs_run_queries_timescaledb --max-rps=0 --hdr-latencies=0rps_timescaledb_query_cpu-max-all-1.hdr --pass=password '--postgres=sslmode=disable port=5432' --db-name=benchmark --hosts=rpi-cable --user=postgres --workers=1 --max-queries=1000 --file=/tmp/bulk_data/timescaledb_query_cpu-max-all-1
After 100 queries with 1 workers:

Note the second line in this previous output:

TimescaleDB last row per host:

So, the query is checking something like:

SELECT DISTINCT ON (hostname) * FROM CPU ORDER BY hostname, time DESC

Continuing to the following query:

TimescaleDB max of all CPU metrics, random 1 hosts, random 8h0m0s by 1h:

Interval query rate: 12.76 queries/sec	Overall query rate: 12.76 queries/sec
TimescaleDB max of all CPU metrics, random    1 hosts, random 8h0m0s by 1h:
min:    47.42ms, med:    71.53ms, mean:    78.32ms, max:  186.71ms, stddev:    27.65ms, sum:   7.8sec, count: 100
...
Run complete after 1000 queries with 1 workers (Overall query rate 12.09 queries/sec):
TimescaleDB max of all CPU metrics, random    1 hosts, random 8h0m0s by 1h:
min:    44.31ms, med:    78.26ms, mean:    82.71ms, max:  836.25ms, stddev:    38.17ms, sum:  82.7sec, count: 1000
wall clock time: 82.804588sec

TimescaleDB CPU over threshold, 1 host(s):

+ tsbs_run_queries_timescaledb --max-rps=0 --hdr-latencies=0rps_timescaledb_query_high-cpu-1.hdr --pass=password '--postgres=sslmode=disable port=5432' --db-name=benchmark --hosts=rpi-cable --user=postgres --workers=1 --max-queries=1000 --file=/tmp/bulk_data/timescaledb_query_high-cpu-1
After 100 queries with 1 workers:
Interval query rate: 11.28 queries/sec	Overall query rate: 11.28 queries/sec
TimescaleDB CPU over threshold, 1 host(s):
min:    23.80ms, med:    84.60ms, mean:    88.61ms, max:  235.81ms, stddev:    41.83ms, sum:   8.9sec, count: 100
...
After 400 queries with 1 workers:
Run complete after 1000 queries with 1 workers (Overall query rate 15.51 queries/sec):
TimescaleDB CPU over threshold, 1 host(s):
min:    21.90ms, med:    57.68ms, mean:    64.47ms, max:  942.30ms, stddev:    40.32ms, sum:  64.5sec, count: 1000
Saving High Dynamic Range (HDR) Histogram of Response Latencies to 0rps_timescaledb_query_high-cpu-1.hdr
wall clock time: 64.562662sec

As we can see, a simple Raspberry Pi can give us some good insights into how powerful is TimescaleDB even running on small hardware.

Just kidding, this is just my initial attempts with low a small set to see how small and accessible hardware can be working with all these great technologies.

The speed seems to be reasonably good to 4 cores of 1.2Ghz and 1GB of RAM.

That’s all for Saturday night hacking! Thanks for reading 🤗