Gladys Plus problem

Ah yeah, that’s rough :sweat_smile: especially since you told me the data came from Node-RED, so you’re the one controlling the sending frequency?

You really need to reduce it — you’re needlessly hammering Gladys!

So… after taking a closer look I had misled myself by reading the consumption measurement… but actually it’s a device that’s supposed to measure the consumption of my water heater.
So in the end nothing to do with the SolarEdge API (I had checked, I check every minute, so nothing earth-shattering).

And for quite some time I didn’t understand because it wasn’t working anymore… In fact it’s indeed from it that the spam comes!

So I’m simply going to remove it and see what happens without it.

We’ll see later about reintegrating it into the network :slight_smile:

Thanks a lot for your help, I’ll wait to see how it goes before marking the thread as resolved!! :wink:

1 Like

Good thing we found it! :blush:

Also check the other sensors to be sure it’s really the only one causing a problem — from the values you posted, it did look like it was the only one.

Yes, all the more so because what tipped me off is that I usually disable the history of the signal strength and it’s spamming me with that too!

PS : It’s been 10 minutes since I started the device deletion… and I’m only at 30% of the states deleted ^^

1 Like

It’s purging, it’s purging :laughing:

1 Like

All right, now I’ll monitor and retest my DB if I have the slightest doubt :wink:

1 Like

False alarm… Gladys Plus was disconnected again this morning. I I will only be able to investigate tonight.

However, I don’t actually know what to look for.

Extremely radical proposal: manually stop the ZigBee2MQTT container for 24 hours…

I can consider it even though it’s quite inconvenient.
The problem: it sometimes happens that everything works for

Oh that’s a shame, can you

1 Like

Since I’m on vacation I’ll wait until I’m back to dive back into it.
In the meantime, luckily I have a VPN so I can restart my container from here! :sweat_smile:

1 Like

Here is a long log file; I left it intact on purpose in case it can help with debugging:

https://pastey.nasdoury.ovh/view/9zbRDyE/raw

I see several disconnection issues, including one that isn’t followed by a reconnection.
However, I have absolutely no idea where this could be coming from.

Also, is it normal that right now, my Gladys Plus having been disconnected for about 24 hours, the RAM usage on my NUC is over 1 GB?
When I restart Gladys it drops back to around 250 MB.

I’m asking just in case!

Hi @guim31, thanks for the additional info :slight_smile:

First, I notice that all the disconnections are due to internet outages; we even see the ping to healthcheck fail:

2024-07-30T11:05:05.614060982Z 2024-07-30T13:05:05+0200 \u003cwarn\u003e index.js:914 (Socket.\u003canonymous\u003e) Socket disconnected client side. Trying to reconnect...
2024-07-30T11:05:22.432354725Z 2024-07-30T13:05:22+0200 \u003cwarn\u003e scene.executeActions.js:37 (executeAction) AxiosError: getaddrinfo EAI_AGAIN hc-ping.com
2024-07-30T11:05:22.432399450Z     at Function.AxiosError.from (/src/server/node_modules/axios/lib/core/AxiosError.js:89:14)
2024-07-30T11:05:22.432406693Z     at RedirectableRequest.handleRequestError (/src/server/node_modules/axios/lib/adapters/http.js:518:25)
2024-07-30T11:05:22.432411380Z     at RedirectableRequest.emit (node:events:529:35)
2024-07-30T11:05:22.432415768Z     at ClientRequest.eventHandlers.\u003ccomputed\u003e (/src/server/node_modules/follow-redirects/index.js:14:24)
2024-07-30T11:05:22.432420731Z     at ClientRequest.emit (node:events:517:28)
2024-07-30T11:05:22.432457075Z     at TLSSocket.socketErrorListener (node:_http_client:501:9)
2024-07-30T11:05:22.432463594Z     at TLSSocket.emit (node:events:517:28)
2024-07-30T11:05:22.432468263Z     at emitErrorNT (node:internal/streams/destroy:151:8)
2024-07-30T11:05:22.432472552Z     at emitErrorCloseNT (node:internal/streams/destroy:116:3)
2024-07-30T11:05:22.432476961Z     at processTicksAndRejections (node:internal/process/task_queues:82:21) {
2024-07-30T11:05:22.432481377Z   hostname: 'hc-ping.com',
2024-07-30T11:05:22.432485603Z   syscall: 'getaddrinfo',
2024-07-30T11:05:22.432489776Z   code: 'EAI_AGAIN',
2024-07-30T11:05:22.432493912Z   errno: -3001,

Telegram errors everywhere:

2024-07-30T11:07:03.225446331Z 2024-07-30T13:07:03+0200 \u003cwarn\u003e message.connect.js:19 (TelegramBot.\u003canonymous\u003e) Telegram polling error, code = EFATAL, message = EFATAL: Error: connect ETIMEDOUT 212.27.38.252:443

I see the errors last about 1 minute generally, and theoretically the instance reconnects afterwards:

2024-07-30T11:06:04.150166443Z 2024-07-30T13:06:04+0200 \u003cinfo\u003e index.js:884 (Socket.\u003canonymous\u003e) Gladys Gateway: connected in websockets

Except that in this case, what’s strange is that Telegram still hasn’t regained the connection after Gladys Plus reconnected :thinking:

What we need to understand is what’s wrong with your network. Why these outages? And does it really come back?

Yes that’s normal: when your instance disconnects, « socket.io » (the websocket library we use) buffers all messages to send to Gladys Plus until Gladys Plus reconnects (See → Offline behavior | Socket.IO )

Which makes a bunch of messages accumulate in RAM.

PS: I’m thinking it might be a good idea to set a timeout on messages sent over WebSocket to avoid them piling up in RAM — after 30 seconds there’s no point in keeping a WebSocket request ^^

But yeah, I don’t think that’s the solution to your problem

1 Like

Ah, while reading all the changelogs of the socket.io library, I came across some interesting stuff:

Previously, getting disconnected while waiting for an acknowledgement
would create a memory leak, as the acknowledgement was never received
and the handler would stay in memory forever.

This could explain why RAM usage increases a lot, and maybe this bug would prevent socket.io from reconnecting if you accumulate too many requests during disconnection.

I think I’ll make a note to update socket.io on the Gladys Plus front-end and on the Gladys server; that could solve this reconnection issue!

(that won’t fix your internet connection drop problems though, so if I were you I’d keep investigating anyway ^^)

1 Like

Thanks @pierre-gilles for all this information.
On my side, I think I’ve found what was disrupting my network!!

I recently added a

2 Likes

Until yesterday everything was better: no disconnections… and then, bam, my instance is disconnected again.
I restarted Gladys — disconnected again this morning.

Here are other logs that don’t seem to tell me much more, I think.

I’ll investigate this problem with Telegram which I don’t understand (but I’m not being spammed with errors like before… maybe it’s normal to have occasional connection errors?).

Right now my instance is disconnected, but the RAM is only around 600 MB, not too extreme I think.

https://pastey.nasdoury.ov

You probably checked, but did your second DHCP server happen to reactivate by any chance?

I straight-up kicked him off my network :wink:

1 Like

@guim31 I see lots of errors:

2024-08-06T01:12:21.980588980Z 2024-08-06T03:12:21+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.309015994Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.640530705Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.969901142Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:23.300108257Z 2024-08-06T03:12:23+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:23.630625049Z 2024-08-06T03:12:23+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway

And also this:

2024-08-06T01:05:51.211667212Z 2024-08-06T03:05:51+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = EFATAL, message = EFATAL: Error: read ECONNRESET

Are you sure your network issues are resolved? If Telegram is having trouble reconnecting, I think you still have issues.