Titel: High CPU usage (~60%) with v4.70.0 – fixed by downgrading to v4.66.8


Environment:

  • Gladys v4.70.0 (auto-updated via Watchtower)

  • Docker on Linux (Ubuntu), network_mode: host

  • MQTT integration with multiple devices (Raspberry Pis)


Problem:

After Watchtower automatically updated Gladys from v4.66.8 to v4.70.0, the container started consuming ~60% CPU constantly – not just at startup, but permanently.

gladys    59.72%

The logs showed a flood of errors at startup:

NotFoundError: DeviceFeature mqtt:xxx not found

These appear to be a timing issue where MQTT retained messages arrive before Gladys has loaded its device feature cache. However, even after the startup errors stopped and logs were clean, CPU remained at ~60%.


Solution:

Downgraded back to v4.66.8 and pinned the version to prevent Watchtower from auto-updating:

yaml

gladys:
  image: gladysassistant/gladys:v4.66.8
  labels:
    com.centurylinklabs.watchtower.enable: "false"
```

After downgrade, CPU dropped immediately to ~4%.
```
gladys    3.90%

Question:

Is this a known issue with v4.70.0? Are there plans to fix the CPU regression? I’d like to update eventually but not if it tanks performance again.

Hi @bamboleate
you must have something wrong because on my side I don’t have any issues with the CPU spiking since version 4.70 (or even before):


And I have Zigbee, Node-RED, Z-Wave and scenes running on Gladys.

What’s your hardware setup to run Gladys? (CPU, RAM, HDD/SSD, etc.)

Hi @mutmut, thanks for checking!

My hardware:

  • Machine: Minisforum UM350 (AMD Ryzen 5 3550H)

  • RAM: 32 GB

  • Storage: several HDDs and SSDs connected

  • OS: Ubuntu 24.04

  • Docker, network_mode: host

The CPU spike is very reproducible on my end. Here’s the before/after:

v4.70.0:

gladys    59.72%

v4.66.8 (same machine, same config, same moment):

gladys    3.90%

I’m running MQTT integration with 3 Raspberry Pis (relay boards + sensors). No Zigbee, no Z-Wave, Node-RED I do use.

One thing I noticed in the logs at startup with v4.70.0:

NotFoundError: DeviceFeature mqtt:xxx not found

This repeats for every MQTT device feature at every startup – looks like retained messages arrive before the device cache is ready. Could this flood of errors be causing the CPU spike? Maybe it’s specific to MQTT setups?

Your hardware is clearly good; on my side I’m on a Proxmox cluster and Gladys is on an LXE (equivalent of a VM, to keep it simple) with 6GB of RAM and 2 vCPU.

I’m not an expert for a thorough analysis, unfortunately.
However, regarding Gladys’s MQTT integration, do you use the internal integration?

Then, what does the debug page say?

If you use MQTT Explorer (for example) to connect to the MQTT broker, do any of your RPi information appear?

Another point, how did you configure your devices in the MQTT integration?


From what I see in the error message, one (or several?) device is not found/recognized; you should check with MQTT Explorer whether it is visible and whether its configuration is still correct in Gladys’s MQTT integration.

Hi @mutmut, thanks for the detailed questions!

Yes, I use the internal MQTT integration in Gladys.

Regarding the NotFoundError messages: I already investigated this thoroughly. The errors appear only at startup and are a timing issue – the MQTT broker delivers retained messages before Gladys has finished loading its device feature cache. After startup, the logs are completely clean and all devices work correctly. The RPis publish correctly, the topics are right, and everything is visible in MQTT Explorer.

The errors are not the cause of my problems – they’re a symptom of the startup timing. And more importantly: the CPU stays at ~60% permanently, not just during the error flood at startup.

The key point is the direct comparison:

  • v4.70.0 → 60% CPU constantly, same errors at startup

  • v4.66.8 → 4% CPU constantly, same errors at startup (timing issue exists in both versions)

So something changed between v4.66.8 and v4.70.0 that causes a CPU regression, at least on my setup with MQTT. The errors at startup are a separate (older) issue.

@bamboleate Many thanks for the report, and sorry for the inconvenience.

Did you try installing any other Gladys releases in between?

Here is the full changelog:

Would you mind upgrading version by version until you find which release introduces the issue?

That would help us pinpoint exactly which version caused the problem, which will make it much easier for us to investigate.

My first instinct says it’s from 4.70 because before I had watchtower on and I didn’t bother with micromanaging all the versions, but when first introduced to the issue, it was 4.70 and I only downgraded quite a bit to not fiddle as much and just have it working again.

BUT I will install all versions eventually to help you pinpoint the issue. Ofc this weekend I’m unavailable so it will take some time.. I’ll get back here when I know more
THANK YOU for all you do and have done!

2 Likes

Hi @pierre-gilles,

I did the gradual upgrade as you suggested. Here are my results:

  • v4.66.9 → CPU normal, no issues
  • v4.67.0 → CPU immediately jumps to ~57% and stays there

So the issue is introduced in v4.67.0.

The changelog for that version shows only three changes:

  • Nuki integration
  • Gladys Gateway DuckDB backup on temporary DB connection
  • Upgrade HAP dependency to latest stable version

I do not use HomeKit/Apple Home, so I cannot confirm whether the HAP update is the culprit — but it seems like the most likely candidate for a CPU loop.

Hope this helps narrow it down!

Do you see exactly which process is causing the CPU Spike? It’s Gladys node.js process?

Nothing weird in the logs?

Hi @pierre-gilles,

The logs show something interesting. There is a flood of unhandled Promise Rejections related to MQTT DeviceFeatures not found:

NotFoundError: DeviceFeature mqtt:wozipi:07 not found
NotFoundError: DeviceFeature mqtt:wozipi:10 not found
NotFoundError: DeviceFeature mqtt:wozipi_bme680:temperature not found
[...and so on]

These errors appear every few seconds (my WoZiPi sends MQTT messages continuously). On v4.66.9 this was apparently handled silently — on v4.67.0 it seems to cause a CPU loop.

Also notable: ps aux is not available inside the container, but the errors all originate from the Node.js process (index.js), so yes — it is the Gladys Node.js process causing the spike.

I do not use HomeKit, so the HAP dependency upgrade is probably not the culprit. My guess would be a change in unhandled Promise Rejection handling between v4.66.9 and v4.67.0.

Hope this helps!

Weird because nothing changed on this part!

I’m wondering if this could come from the Nuki integration because the Nuki integration is using MQTT to listen to MQTT messages from Nuki locks.

Cc @ProtZ

Btw, we are going to release an improvement to the Nuki integration in next Gladys release to prevent the Nuki integration from starting if not configured, so this could definitely help you for this!

( not 100% sure)

Well.. that goes way over my head. If I can provide something that would help you, just reach out

I’m going to release very soon the new Gladys version with this fix, and if you could just try the release to see if it fixes your issue.

If not we’ll investigate more together :blush:

1 Like

I confirm this point — I also have Nuki logs that were not present before, related to MQTT.

For now, on my side the load is still tolerated by the mini PC, but it caught my

1 Like

Hello,
I confirm that the extra MQTT logs come from the Nuki v1 integration; this is normally fixed in v1.0.1 (@

2 Likes

@bamboleate I just released a Gladys upgrade (v4.71.0) with the fix, let me know if it fixes your issue!

Hi @Pierre-Gilles,

I upgraded from v4.66.8 to v4.71.0 and still see ~60–70% CPU usage. After some deep debugging, I found two separate issues:


Issue 1: Race condition with MQTT retained messages (startup crash loop)

On startup, Mosquitto immediately replays all retained messages on gladys/master/# before Gladys has finished loading its device cache. This causes a flood of NotFoundError: DeviceFeature mqtt:xxx not found errors — one per retained topic, per restart. The CPU spike is caused by the unhandled promise rejections overwhelming the event loop.

Workaround: delete all retained messages on gladys/master/# using mosquitto_pub -r -n. After a restart, Gladys repopulates them and works correctly — but only until the next restart.

This seems like a bug where the MQTT service subscribes before the device manager has finished initializing.


Issue 2: LAN Manager presence scanner calling ip neigh show in a tight loop

Even with LAN Manager set to « disabled » in the UI (and LANMANAGER_PRESENCE_STATUS=disabled confirmed in the DB), Gladys continues spawning ip neigh show via execve continuously. Since ip is not installed in the Gladys Docker image (exit code 127), every call fails immediately and the loop repeats.

I confirmed this with strace -f -e execve on the Gladys PID.

Workaround: injecting a dummy ip script that exits cleanly reduces the overhead slightly, but the loop itself continues.

This appears to be a bug where the LAN Manager does not respect the disabled state at runtime.


Environment:

  • Gladys v4.71.0 (upgraded from v4.66.9)
  • Docker, network_mode: host
  • External Mosquitto broker (eclipse-mosquitto:2.0)
  • ~30 MQTT devices (on several pis)
  • Host: Ubuntu 24.04, 32 GB RAM

Happy to provide more logs or test patches if helpful. Thanks!

1 Like

Thanks for the feedback, I’ll investigate both issues :+1:

That said, I don’t think they were introduced in Gladys v4.67.0, so they’re probably not the root cause of the high CPU usage you’re seeing.

Since you mentioned the CPU usage is constant (not just at startup), it also doesn’t look like a retained messages issue.

You also said your WoZiPi is sending MQTT messages continuously, do you have an idea of the message rate (e.g. messages per second)?

One possible explanation is that there’s no actual “bug”, but rather a throughput limitation:

With the additional code we added recently (notably around Nuki), the MQTT integration might be a bit slower to process messages. If your setup is sending messages at a high frequency, Gladys might not keep up, leading to a backlog and increased CPU usage.

If that’s the case, the solution would likely be to optimize the MQTT handling in Gladys to improve throughput.

Let me know about the message volume, that would help narrow it down!

Hi @Pierre-Gilles,

I measured the MQTT message rate using mosquitto_sub -t '#' piped through pv:

~0.8 messages/second (291 messages over ~6 minutes, all topics combined)

That seems on the low end, so I don’t think throughput is the issue. The high CPU (~60–70%) is constant and doesn’t correlate with message spikes.

For reference, my setup publishes relay states from three Raspberry Pis (WoZiPi and AZiPi, SchlaZiPi) plus BME680 and SHT35 sensor data — nothing particularly chatty.