MQTT integration with multiple devices (Raspberry Pis)
Problem:
After Watchtower automatically updated Gladys from v4.66.8 to v4.70.0, the container started consuming ~60% CPU constantly – not just at startup, but permanently.
gladys 59.72%
The logs showed a flood of errors at startup:
NotFoundError: DeviceFeature mqtt:xxx not found
These appear to be a timing issue where MQTT retained messages arrive before Gladys has loaded its device feature cache. However, even after the startup errors stopped and logs were clean, CPU remained at ~60%.
Solution:
Downgraded back to v4.66.8 and pinned the version to prevent Watchtower from auto-updating:
yaml
gladys:
image: gladysassistant/gladys:v4.66.8
labels:
com.centurylinklabs.watchtower.enable: "false"
```
After downgrade, CPU dropped immediately to ~4%.
```
gladys 3.90%
Question:
Is this a known issue with v4.70.0? Are there plans to fix the CPU regression? I’d like to update eventually but not if it tanks performance again.
The CPU spike is very reproducible on my end. Here’s the before/after:
v4.70.0:
gladys 59.72%
v4.66.8 (same machine, same config, same moment):
gladys 3.90%
I’m running MQTT integration with 3 Raspberry Pis (relay boards + sensors). No Zigbee, no Z-Wave, Node-RED I do use.
One thing I noticed in the logs at startup with v4.70.0:
NotFoundError: DeviceFeature mqtt:xxx not found
This repeats for every MQTT device feature at every startup – looks like retained messages arrive before the device cache is ready. Could this flood of errors be causing the CPU spike? Maybe it’s specific to MQTT setups?
Your hardware is clearly good; on my side I’m on a Proxmox cluster and Gladys is on an LXE (equivalent of a VM, to keep it simple) with 6GB of RAM and 2 vCPU.
I’m not an expert for a thorough analysis, unfortunately.
However, regarding Gladys’s MQTT integration, do you use the internal integration?
From what I see in the error message, one (or several?) device is not found/recognized; you should check with MQTT Explorer whether it is visible and whether its configuration is still correct in Gladys’s MQTT integration.
Yes, I use the internal MQTT integration in Gladys.
Regarding the NotFoundError messages: I already investigated this thoroughly. The errors appear only at startup and are a timing issue – the MQTT broker delivers retained messages before Gladys has finished loading its device feature cache. After startup, the logs are completely clean and all devices work correctly. The RPis publish correctly, the topics are right, and everything is visible in MQTT Explorer.
The errors are not the cause of my problems – they’re a symptom of the startup timing. And more importantly: the CPU stays at ~60% permanently, not just during the error flood at startup.
The key point is the direct comparison:
v4.70.0 → 60% CPU constantly, same errors at startup
v4.66.8 → 4% CPU constantly, same errors at startup (timing issue exists in both versions)
So something changed between v4.66.8 and v4.70.0 that causes a CPU regression, at least on my setup with MQTT. The errors at startup are a separate (older) issue.
My first instinct says it’s from 4.70 because before I had watchtower on and I didn’t bother with micromanaging all the versions, but when first introduced to the issue, it was 4.70 and I only downgraded quite a bit to not fiddle as much and just have it working again.
BUT I will install all versions eventually to help you pinpoint the issue. Ofc this weekend I’m unavailable so it will take some time.. I’ll get back here when I know more
THANK YOU for all you do and have done!
I did the gradual upgrade as you suggested. Here are my results:
v4.66.9 → CPU normal, no issues
v4.67.0 → CPU immediately jumps to ~57% and stays there
So the issue is introduced in v4.67.0.
The changelog for that version shows only three changes:
Nuki integration
Gladys Gateway DuckDB backup on temporary DB connection
Upgrade HAP dependency to latest stable version
I do not use HomeKit/Apple Home, so I cannot confirm whether the HAP update is the culprit — but it seems like the most likely candidate for a CPU loop.
The logs show something interesting. There is a flood of unhandled Promise Rejections related to MQTT DeviceFeatures not found:
NotFoundError: DeviceFeature mqtt:wozipi:07 not found
NotFoundError: DeviceFeature mqtt:wozipi:10 not found
NotFoundError: DeviceFeature mqtt:wozipi_bme680:temperature not found
[...and so on]
These errors appear every few seconds (my WoZiPi sends MQTT messages continuously). On v4.66.9 this was apparently handled silently — on v4.67.0 it seems to cause a CPU loop.
Also notable: ps aux is not available inside the container, but the errors all originate from the Node.js process (index.js), so yes — it is the Gladys Node.js process causing the spike.
I do not use HomeKit, so the HAP dependency upgrade is probably not the culprit. My guess would be a change in unhandled Promise Rejection handling between v4.66.9 and v4.67.0.
Btw, we are going to release an improvement to the Nuki integration in next Gladys release to prevent the Nuki integration from starting if not configured, so this could definitely help you for this!
On startup, Mosquitto immediately replays all retained messages on gladys/master/# before Gladys has finished loading its device cache. This causes a flood of NotFoundError: DeviceFeature mqtt:xxx not found errors — one per retained topic, per restart. The CPU spike is caused by the unhandled promise rejections overwhelming the event loop.
Workaround: delete all retained messages on gladys/master/# using mosquitto_pub -r -n. After a restart, Gladys repopulates them and works correctly — but only until the next restart.
This seems like a bug where the MQTT service subscribes before the device manager has finished initializing.
Issue 2: LAN Manager presence scanner calling ip neigh show in a tight loop
Even with LAN Manager set to « disabled » in the UI (and LANMANAGER_PRESENCE_STATUS=disabled confirmed in the DB), Gladys continues spawning ip neigh show via execve continuously. Since ip is not installed in the Gladys Docker image (exit code 127), every call fails immediately and the loop repeats.
I confirmed this with strace -f -e execve on the Gladys PID.
Workaround: injecting a dummy ip script that exits cleanly reduces the overhead slightly, but the loop itself continues.
This appears to be a bug where the LAN Manager does not respect the disabled state at runtime.
Environment:
Gladys v4.71.0 (upgraded from v4.66.9)
Docker, network_mode: host
External Mosquitto broker (eclipse-mosquitto:2.0)
~30 MQTT devices (on several pis)
Host: Ubuntu 24.04, 32 GB RAM
Happy to provide more logs or test patches if helpful. Thanks!
Thanks for the feedback, I’ll investigate both issues
That said, I don’t think they were introduced in Gladys v4.67.0, so they’re probably not the root cause of the high CPU usage you’re seeing.
Since you mentioned the CPU usage is constant (not just at startup), it also doesn’t look like a retained messages issue.
You also said your WoZiPi is sending MQTT messages continuously, do you have an idea of the message rate (e.g. messages per second)?
One possible explanation is that there’s no actual “bug”, but rather a throughput limitation:
With the additional code we added recently (notably around Nuki), the MQTT integration might be a bit slower to process messages. If your setup is sending messages at a high frequency, Gladys might not keep up, leading to a backlog and increased CPU usage.
If that’s the case, the solution would likely be to optimize the MQTT handling in Gladys to improve throughput.
Let me know about the message volume, that would help narrow it down!
I measured the MQTT message rate using mosquitto_sub -t '#' piped through pv:
~0.8 messages/second (291 messages over ~6 minutes, all topics combined)
That seems on the low end, so I don’t think throughput is the issue. The high CPU (~60–70%) is constant and doesn’t correlate with message spikes.
For reference, my setup publishes relay states from three Raspberry Pis (WoZiPi and AZiPi, SchlaZiPi) plus BME680 and SHT35 sensor data — nothing particularly chatty.