Live Coding: Discovering DuckDB on Thursday, June 20 at 10 AM!

pierre-gilles · June 24, 2024, 2:57pm

It’s hard to predict the transfer time, it depends on a lot of factors, mainly disk speed — it’s really unique to each hardware.

The « transfer » part is not blocking, Gladys can be used at the same time.

However, during the transfer the graphs won’t necessarily be available; I still need to decide whether to keep the duplicate code forever in Gladys, or remove the code and therefore during the migration the graphs will be empty, which may be an acceptable tradeoff.

On the other hand, what’s blocking is the « VACUUM » part of the SQLite DB to perform the final disk size reduction, and we need to decide whether to force it or ask the user to do it. It’s probably smarter to ask the user to do it so as not to block their instance.

For the record, on my disk the VACUUM of your 13GB DB took 30 seconds, but I’m on a MacBook Pro from the future (10 CPU cores / NVMe SSD at 4.5GB/s throughput), on a Pi it will take quite a while I think

I still recommend doing it anyway — you’ll get alerts every day now that 4.44 is out.

And I’m really not ready to release the version with DuckDB, I said months

pierre-gilles · June 28, 2024, 6:48pm

Very productive development day today!

To keep you up to date, what’s been coded:

Insertion of states into DuckDB instead of SQLite
Graph API queries via DuckDB
Removal of obsolete aggregation tasks
Database migration task

(I’m writing tests concurrently, so everything that’s implemented is tested)

In development:

Gladys Plus backup and restore

The backup will not be done by simply saving the .duckdb file; duckDB provides an API that exports a compressed .parquet file (Parquet, what is it?) — a format dedicated to time-series data. This file format is nevertheless a bit heavier in storage than a DuckDB file; I’m following DuckDB’s recommendations and the official export/import API. I don’t think it’s smart to copy a .duckdb file live to make a backup.

Clean feedback in the interface during migration

The interface must be very clear and explain what is happening to the user.

To do:

Decide when to delete data in SQLite. For now I chose not to delete it automatically to avoid data loss, and to let the user choose to delete that data afterwards via a button in the interface (not coded yet)
Real-world testing and optimization of the migration on instances with lots of data and low CPU/disk resources
Test interruption and resumption of migration
Multi-day/week real-world testing of Gladys with DuckDB. Goal: ensure stability — zero crashes tolerated
Probably things I’ve forgotten

I’ll keep you posted!

GBoulvin · June 28, 2024, 7:26pm

Wow! For a project spanning several months… You can feel the motivation .

I’ll correct you right away: « in the process of (…) toward DuckDB » will be more correct and « a certain time during which your » sounds nicer. There, it’s done!

As a reminder, I have an RPi2 and a zero2W (equivalent to an RPi3) available for testing. And I’m a teacher so I’m free in the summer…

pierre-gilles · July 1, 2024, 3:36pm

I made good progress again today, the user will be able to track whether the migration took place successfully in the System tab:

They will be able to purge the SQLite states themselves (this won’t be automatic at first, to leave control to the user).

In the « Tasks » tab, they can see the migration task progress live / the errors if there are any:

StephaneB · July 1, 2024, 5:35pm

I find the interface clean, cool. I imagine that the « purge SQLite states » button won’t be enabled until the migration has been completed…

And on the other hand, while I think it’s good that a manual action is required to delete the data from the SQLite database, I don’t see the point, from a user perspective, of having to perform two manual actions: purge, then clean up. Is there a reason for that?

pierre-gilles · July 1, 2024, 5:48pm

Not a bad point, I’ll change that It wasn’t the case until now!

Purging is a background, non-blocking action. You can run the purge during the day without issues; it deletes in batches little by little. It can take time, though, but it’s non-blocking.

The cleanup is a SQLite command (« VACUUM ») that is blocking — Gladys is completely blocked while it runs. On a very fast NVMe SSD on a mini PC, the downtime will be short.

On a USB SSD on a Pi, or worse on an SD card, the blocking can take several hours (this is noted on the cleanup button).

So I preferred to separate the two actions so the user can choose when to schedule each action.

If we chain the two together, and you start the purge while Gladys is running, then 2 hours later it moves on to cleanup and Gladys gets blocked, the user might not understand and will think it’s a bug!

StephaneB · July 1, 2024, 6:44pm

Very clear. And the choice you make is therefore the right one.

I’d like to take this opportunity to thank you for the clarity and helpfulness you regularly show when answering people’s questions on this forum

pierre-gilles · July 15, 2024, 2:35pm

I started a manual Docker build early this afternoon to run tests on a Raspberry Pi…

![Screenshot 2024-07-15 at 16.34.01|690x241](

GBoulvin · July 15, 2024, 4:33pm

I don’t know anything about it, but couldn’t it be put into a ‹ cache ›-type system?
If not, it’s going to be a pain for you!

pierre-gilles · July 15, 2024, 5:06pm

That’s already the case In any case, you do need a first build to initialize the cache / renew the cache when dependencies change

it took more than 4 hours in total anyway:

Afterwards, the question of support for armv6/v7 comes up — we have other things that no longer work on armv6 (notably Node 20), and we’ll probably soon be pushed to abandon that platform… (no more Pi Zero in particular, but then, does Gladys really aim to run on those machines, knowing that even on a Pi 3/4 it doesn’t run very well ^^)

pierre-gilles · August 2, 2024, 9:48am

Hi everyone!

Today, performance test of the Docker image with DuckDB on a Pi (to test on an intentionally slow environment)

The Docker image I’m using :

gladysassistant/gladys:duckdb

GBoulvin · August 2, 2024, 10:38am

Would it be useful for you if I run the image on a Pi Zero 2 ?
Or on my production Pi 4 (4GB of RAM - after backing up the database, obviously) ?

pierre-gilles · August 2, 2024, 10:47am

Not for now The image doesn’t even work right now

pierre-gilles · August 2, 2024, 2:41pm

Good, I finally managed to get the image working!

However I had to switch from a Docker node-alpine image to node-slim, DuckDB doesn’t install well on Alpine…

For now, the Docker image size has increased quite a bit, but I think there are optimizations to be made.

In the meantime, the DuckDB migration on the Pi 3 went well:

2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:39 (DeviceManager.migrateFromSQLiteToDuckDb) DuckDB: Migrating data from SQLite
2024-08-02T16:24:51+0200 \u003cinfo\u003e index.js:64 (Server.\u003canonymous\u003e) Server listening on port 80
2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:47 (DeviceManager.migrateFromSQLiteToDuckDb) DuckDB: Found 0 already migrated device features in DuckDB.
2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:51 (DeviceManager.migrateFromSQLiteToDuckDb) DuckDB: Migrating 3 device features
2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:8 (migrateStateRecursive) DuckDB : Migrating device feature = 91f39b3d-d747-4fd0-9880-8bc1e8f9067e, offset = 0
2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:11 (migrateStateRecursive) DuckDB : Device feature = 91f39b3d-d747-4fd0-9880-8bc1e8f9067e has 0 states to migrate.
2024-08-02T16:24:51+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:8 (migrateStateRecursive) DuckDB : Migrating device feature = 813c1830-b494-4f01-b339-1f287f621548, offset = 0
2024-08-02T16:24:54+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:11 (migrateStateRecursive) DuckDB : Device feature = 813c1830-b494-4f01-b339-1f287f621548 has 34723 states to migrate.
2024-08-02T16:24:55+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 0 for deviceFeature = 813c1830-b494-4f01-b339-1f287f621548.
2024-08-02T16:25:00+0200 \u003cinfo\u003e scene.checkCalendarTriggers.js:25 (SceneManager.checkCalendarTriggers) Checking calendar triggers at Fri, 02 Aug 2024 14:25:00 GMT
2024-08-02T16:25:02+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 1 for deviceFeature = 813c1830-b494-4f01-b339-1f287f621548.
2024-08-02T16:25:09+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 2 for deviceFeature = 813c1830-b494-4f01-b339-1f287f621548.
2024-08-02T16:25:16+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 3 for deviceFeature = 813c1830-b494-4f01-b339-1f287f621548.
2024-08-02T16:25:19+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:8 (migrateStateRecursive) DuckDB : Migrating device feature = 813c1830-b494-4f01-b339-1f287f621548, offset = 40000
2024-08-02T16:25:19+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:11 (migrateStateRecursive) DuckDB : Device feature = 813c1830-b494-4f01-b339-1f287f621548 has 0 states to migrate.
2024-08-02T16:25:20+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:8 (migrateStateRecursive) DuckDB : Migrating device feature = 24e039f2-d497-468a-87ee-517a1e67923d, offset = 0
2024-08-02T16:25:20+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:11 (migrateStateRecursive) DuckDB : Device feature = 24e039f2-d497-468a-87ee-517a1e67923d has 10272 states to migrate.
2024-08-02T16:25:21+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 0 for deviceFeature = 24e039f2-d497-468a-87ee-517a1e67923d.
2024-08-02T16:25:27+0200 \u003cinfo\u003e index.js:130 () DuckDB : Inserting chunk 1 for deviceFeature = 24e039f2-d497-468a-87ee-517a1e67923d.
2024-08-02T16:25:27+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:8 (migrateStateRecursive) DuckDB : Migrating device feature = 24e039f2-d497-468a-87ee-517a1e67923d, offset = 40000
2024-08-02T16:25:27+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:11 (migrateStateRecursive) DuckDB : Device feature = 24e039f2-d497-468a-87ee-517a1e67923d has 0 states to migrate.
2024-08-02T16:25:27+0200 \u003cinfo\u003e device.migrateFromSQLiteToDuckDb.js:76 (DeviceManager.migrateFromSQLiteToDuckDb) DuckDB: Finished migrating DuckDB.

My task « Delete states in SQLite » may be too aggressive — on a Pi 3 with an SD card it makes Gladys hardly usable during the migration; in any case it’s a setup that isn’t recommended.

pierre-gilles · August 2, 2024, 2:59pm

@GBoulvin (and others!) If you want to test this time it’s fine, however it’s available only on amd64 or arm64 architecture (so no Pi Zero for now)

What I recommend if you want to test is to do it completely separate from your prod, ideally on a machine separate from your prod

Otherwise, if you only have your Pi 4 for prod, you absolutely must do this in a different folder than your prod DB, this update is major and rolling back to the gladys:v4 image currently in production is not possible

I should note that having changed the OS for the Docker image, some « native » integrations may no longer work (e.g., Bluetooth, or others), there’s QA to do that I haven’t done yet ^^

The Docker image :

gladysassistant/gladys:duckdb

Don’t hesitate if you have any feedback

cicoub13 · August 4, 2024, 9:54am

The image is working well for now. The database migration took about 15 minutes.
My production SQLite database is 7 GB, the DuckDB is 27 MB.
RPI4 (Raspberry Pi 4) 8GB with SSD
Initial feedback: the values are not truncated / rounded

pierre-gilles · August 5, 2024, 7:42am

Thanks for the feedback @cicoub13! Did you leave it running or just test it once?

Regarding the rounding, I’m aware — I had opened a separate PR because the issue is already present in production right now, but I think I’ll merge that PR into the DuckDB PR and rebuild!

cicoub13 · August 5, 2024, 8:22am

It’s been running since Friday.
Zigbee2Mqtt / MQTT / Enedis / OpenWeather / Camera and Telegram OK

pierre-gilles · August 5, 2024, 8:27am

I just merged the PR to remove the extra zeros from the charts and relaunched a build

Build time about 4h15

Edit: crazy, with the Docker cache it only took 1 minute

@cicoub13 can you update your image locally to get the charts without the 0s!

pierre-gilles · August 5, 2024, 9:50am

I tried to slim down the Docker image size, I relaunched a build!

Topic		Replies	Views
Gladys Assistant 4.45 : DuckDB, une révolution dans Gladys ! ⚡ Actualités	106	942	September 14, 2024
26 août 2024 : Mise à jour majeure de Gladys! Actualités	25	759	August 30, 2024
Gladys Assistant v4.8.6, avec un nouvel algorithme de sauvegardes Gladys Plus! Actualités	17	992	May 21, 2022
Suite Bug backup Gladys Plus -> Bug global Gladys Gladys Plus	21	1609	March 29, 2022
Problème de performance sur dashboard avec beaucoup de graphiques Configuration	52	2465	October 3, 2022

Live Coding: Discovering DuckDB on Thursday, June 20 at 10 AM!

Related topics