[help needed] Problems accessing Gladys

mutmut · December 13, 2025, 10:52am

Hello,
first big problem: I have a major Gladys crash
I don’t know if it’s related but I finished the consumption calculation 30 minutes after the start (around 12:00), then I launched the cost calculation (about 1 hour) and after refreshing the page, I can no longer access Gladys.
I restarted the docker: still not working
I cleaned the docker and the gladys image and relaunched my docker compose: still not working
I have the logs if needed.

Second problem which seems to come from the Gladys Plus backup at 3:00 AM.
From what I observed on my proxmox, the RAM and swap of my LXC start to grow until saturation and then Gladys stops responding.
A forced reboot of my LXC restores a normal state afterwards.
However I don’t have any logs because when it crashes I no longer have access.
And in my backups, I see that the last one is from 5 days ago and I indeed had to restart my LXC in the past few days.

Thanks in advance for your help.

pierre-gilles · December 13, 2025, 12:56pm

Yes, I’d like the logs!

3.6 GB, if you have a huge number of states, that’s not that big, right?

Ah damn, I’d appreciate information about that.

@Terdious also noticed a memory leak somewhere, their RAM usage also increases abnormally!

I just checked my Gladys instance, and I see the issue too, but on a smaller scale.

So either there’s a bug in the energy tracking implementation that’s causing a memory leak, or DuckDB has a memory leak in the version I installed, since I updated DuckDB for energy tracking.

I’ll investigate!

Terdious · December 13, 2025, 1:00pm

Hello !!

As I told you on the call, I’m convinced that, for my part at least, I already had the issue before the energy tracking was added. I noticed it at the beginning of November, but I couldn’t say exactly since when!!

I don’t have the issue on my professional instance, which I haven’t updated for about a year and a half.

pierre-gilles · December 13, 2025, 1:04pm

I still opened a pull request to update DuckDB:

github.com/GladysAssistant/Gladys

Upgrade DuckDB to v1.4.3

master ← upgrade-duckdb-1-4-3

ouvert 01:03PM - 13 Dec 25 UTC

Pierre-Gilles

+10 -10

### Pull Request check-list To ensure your Pull Request can be accepted as fa…st as possible, make sure to review and check all of these items: - [x] If your changes affects code, did your write the tests? - [x] Are tests passing? (`npm test` on both front/server) - [x] Is the linter passing? (`npm run eslint` on both front/server) - [x] Did you run prettier? (`npm run prettier` on both front/server) - [x] If you are adding a new features/services, did you run integration comparator? (`npm run compare-translations` on front) - [ ] Did you test this pull request in real life? With real devices? If this development is a big feature or a new service, we recommend that you provide a Docker image to the community ([forum](https://community.gladysassistant.com/)) for testing before merging. ### Description of change Upgrade DuckDB to v1.4.3

You never know!

pierre-gilles · December 13, 2025, 1:07pm

At the beginning of November I can think of a few things:

Either the MCP server
Or the Matter integration

Terdious · December 13, 2025, 1:10pm

Is there any way we could help you find it?

mutmut · December 13, 2025, 1:22pm

It’s also been a while that I’ve had these issues, but I couldn’t say when they started.
At first I thought it was the backups of my LXC in Proxmox that were also at 3 AM, but since I disabled it last night, I saw that it was linked to the Gladys Plus backup at 3 AM.
And because sometimes it worked, I didn’t pay attention.
…
So I just checked and I started having these problems before mid-August this year.
I don’t remember exactly when I activated Gladys Plus but I think it was in July.

pierre-gilles · December 13, 2025, 1:42pm

Aside from running tests on your side, not much more than that We need to find which part of Gladys is responsible.

I’m running tests on my side!

I think there are two distinct issues! Here, we’re talking about a memory leak in Gladys, not really related to Gladys Plus, I think, because I can reproduce the leak during the day even over 30 minutes (without any backup).

We can then look at whether the RAM usage of the backups can be optimized, but I don’t think that’s the problem here

pierre-gilles · December 13, 2025, 1:46pm

First test: MCP integration?

I paused the MCP integration, and restarted Gladys

(@bertrandda, while I’m at it, for now the MCP integration doesn’t have a stop function and therefore stopping the service only has an effect when restarting!)

a few minutes later, RAM usage had already doubled, so I think the MCP service is not the cause

Terdious · December 13, 2025, 3:16pm

For info, I did quite a few tests with ChatGPT. I notably restarted the gladys prod container with the following option to create snapshots.
So with Gladys stopped, I’m at 4.03 GB of memory in use.
Once Gladys finished starting, I’m at 5.7 GB of memory (15:43).

Until 16:00 I stayed in that range, with only a few fluctuations. But you can see the memory is being freed properly.
At 16:00 the consumption / cost calculation did occur with a slight memory increase but was immediately freed. That confirms it’s not that. Tasks completed.
1 minute later, I go to the dashboard (I couldn’t say if that’s the trigger) and poof I instantly jumped to 8.82 GB: Column « RES » (= reserved memory) goes from 1748MB to 4995MB. => Nothing is freed

I’ll paste raw what ChatGPT tells me, and we can see after if you want anything else (Conversation: https://chatgpt.com/share/693d82d6-93e8-800b-8686-ad0b41963f18)

Yes. Here we clearly see what’s happening, and you were right from the start: this is not a JS heap leak, it’s a native memory explosion (outside V8), triggered by a one-off operation.

I’ll give you a precise, factual readout based on the “peak” report you just provided.

What the report says at the time of the peak (numeric evidence)

JS Heap (V8)

"usedMemory": 252011200        ≈ 240 Mo
"externalMemory": 25243016     ≈ 24 Mo

Heap perfectly stable, almost identical to previous snapshots.
No JS leak.

Total process memory

"rss": 6198140928        ≈ 5,77 Go
"maxRss": 6374838272     ≈ 5,93 Go

+4 GB outside the JS heap that appeared abruptly.

Linux side (/proc)

RssAnon:   5 089 920 kB  ≈ 4,85 Go
RssFile:       59 392 kB
Private_Dirty: 5 090 240 kB

Private anonymous memory
Not file-mapped
Not freed afterwards

Typical signature:

massive native allocations (malloc / new on the C/C++ side)

What it is NOT (important)

Not:

V8
Garbage Collector
Retained JS objects
Timers or blocked event loop
Classic Node leak

Your snapshots prove it:
the heap stays at ~260 MB while RSS explodes to ~6 GB.

What it is (almost certain at this stage)

Let’s look at the loaded native bindings:

"/src/server/node_modules/duckdb/lib/binding/duckdb.node"
"/src/server/node_modules/sqlite3/lib/binding/node_sqlite3.node"
"/src/server/services/usb/.../bindings-cpp.node"
"/src/server/services/bluetooth/.../bluetooth_hci_socket.node"

Primary suspect #1: DuckDB

This is THE main suspect, for several reasons:

DuckDB allocates heavily in native memory
Uses:
- vectorized execution
- columnar buffers
- intermediate caches
Doesn’t always return memory to the OS immediately
Can grow by several GB on a single query

And especially:

The peak is instantaneous, high CPU, memory never goes back down
→ exactly the observed behavior.

Why it doesn’t go back down

Very important to understand:

DuckDB logically frees its buffers
BUT:
- malloc() keeps memory in the arena
- the RSS does not go down
- Node will reuse this memory later, but Linux still sees it as « taken »

So:

This is not an infinite leak
It’s a stepwise irreversible increase

The likely trigger in your setup

Given your Gladys + Energy Monitoring + history setup:

energy aggregation
temporal calculations
analytical queries
possibly:
- daily consolidation
- aggregates per phase / device
- historical recalculation

A heavy DuckDB query was launched (cron, internal job, UI action).

This matches:

CPU working
RAM rising in 1 second
no visible JS allocations

Absolute confirmation (if you want to be 100% sure)

During the next peak, run immediately:

docker exec gladys bash -lc '
cat /proc/1/smaps | awk "
/duckdb.node/ {f=1}
^$/ {f=0}
f && /Rss:/ {print}
"
'

If you see hundreds of MB / GB associated with DuckDB → final verdict.

What to do concretely (realistic solutions)

Option A – Limit DuckDB (recommended)

In Gladys (code):

limit query sizes
avoid global aggregates on the full history
chunk by period (day / week)

Option B – Force a Node memory cap

This does not fix the root cause, but prevents host OOM:

-e NODE_OPTIONS="--max-old-space-size=2048 ..."

Note: this does not limit native memory, but avoids stacking the two.

Option C – Scheduled restart (pragmatic)

If the peaks are rare but destructive:

docker restart gladys

weekly
or after a heavy task

This is very common for DuckDB in production.

Clear and concise conclusion

Your data is excellent
The diagnosis is now solid

There is no JS memory leak
RAM is consumed by native allocations (very likely DuckDB)
Memory doesn’t go back down because it remains in the native arena

If you want, next possible steps:

identify which Gladys query triggers the peak
or help you patch / work around on the Gladys side (SQL logging, throttle, split)

See you later!

pierre-gilles · December 13, 2025, 3:20pm

Thanks for your investigations, I arrive at the same conclusions!!

I’ve done the same investigations on my side, and likewise it’s clearly not a JS issue, the heap size is contained. It’s native code that’s causing the problem!

DuckDB — I updated to the latest version, and I also seem to have issues.

You can test by installing gladysassistant/gladys:dev which runs DuckDB v1.4.3.

So:

Either it’s still an unresolved DuckDB bug to this day
Or it’s something else

Terdious · December 13, 2025, 3:26pm

I’m going to test, and to confirm it I went to my

pierre-gilles · December 13, 2025, 3:35pm

You’re right !!

Screenshot 2025-12-13 at 16.35.08

However, it’s not a bug, it’s a feature

pierre-gilles · December 13, 2025, 3:37pm

But clearly, 80% is too much.

We could move to a lower percentage, I think…

Terdious · December 13, 2025, 3:38pm

Hehe that’s what I thought, so ^^

pierre-gilles · December 13, 2025, 3:40pm

Hopefully reducing this number won’t ruin performance in your case, because if you put less in RAM, it’ll use the disk…

There may be some query optimizations to make, even though the queries are extremely simple in this case

Terdious · December 13, 2025, 3:50pm

Let me know when you have a test image, I’ll run the test to give you feedback. We can hope that the NVMe drives are fast enough.

Are we doing pagination right now?

pierre-gilles · December 13, 2025, 4:01pm

The PR :

github.com/GladysAssistant/Gladys

Set DuckDB memory limit to 30% of system memory

master ← set-duckdb-memory-limit

ouvert 04:00PM - 13 Dec 25 UTC

Pierre-Gilles

+19 -1

### Pull Request check-list To ensure your Pull Request can be accepted as fa…st as possible, make sure to review and check all of these items: - [x] If your changes affects code, did your write the tests? - [x] Are tests passing? (`npm test` on both front/server) - [x] Is the linter passing? (`npm run eslint` on both front/server) - [x] Did you run prettier? (`npm run prettier` on both front/server) - [x] If you are adding a new features/services, did you run integration comparator? (`npm run compare-translations` on front) - [x] Did you test this pull request in real life? With real devices? If this development is a big feature or a new service, we recommend that you provide a Docker image to the community ([forum](https://community.gladysassistant.com/)) for testing before merging. ### Description of change Set DuckDB memory limit to 30% of system memory

I’ve put 30% for now, which still seems quite high to me, but well it’s already a big step down from 80%..

We can talk optimization in another thread

pierre-gilles · December 13, 2025, 4:17pm

The image is live on gladysassistant/gladys:set-duckdb-memory-limit

I’m testing it on my machine

Terdious · December 13, 2025, 4:25pm

I can only test it tomorrow, I didn’t see the time…!! Sorry

Topic		Replies	Views
[RESOLU] Lenteur parfois sur dashboard Gladys Plus Configuration	48	1921	February 23, 2021
Suite Bug backup Gladys Plus -> Bug global Gladys Gladys Plus	21	1609	March 29, 2022
Help! Problème avec l'agrégation? Configuration	90	3546	March 29, 2022
Review des PR - J'ai besoin de vous! Développement	73	3517	April 10, 2021
Lancement la semaine prochaine de la compatibilité Debian 11 + améliorations pour Unraid/Synology? Actualités	33	1251	March 21, 2022