Gladys Plus problem

pierre-gilles · July 19, 2024, 11:38am

Ah yeah, that’s rough especially since you told me the data came from Node-RED, so you’re the one controlling the sending frequency?

You really need to reduce it — you’re needlessly hammering Gladys!

guim31 · July 19, 2024, 11:45am

So… after taking a closer look I had misled myself by reading the consumption measurement… but actually it’s a device that’s supposed to measure the consumption of my water heater.
So in the end nothing to do with the SolarEdge API (I had checked, I check every minute, so nothing earth-shattering).

And for quite some time I didn’t understand because it wasn’t working anymore… In fact it’s indeed from it that the spam comes!

So I’m simply going to remove it and see what happens without it.

We’ll see later about reintegrating it into the network

Thanks a lot for your help, I’ll wait to see how it goes before marking the thread as resolved!!

pierre-gilles · July 19, 2024, 12:00pm

Good thing we found it!

Also check the other sensors to be sure it’s really the only one causing a problem — from the values you posted, it did look like it was the only one.

guim31 · July 19, 2024, 12:02pm

Yes, all the more so because what tipped me off is that I usually disable the history of the signal strength and it’s spamming me with that too!

PS : It’s been 10 minutes since I started the device deletion… and I’m only at 30% of the states deleted ^^

pierre-gilles · July 19, 2024, 12:06pm

It’s purging, it’s purging

guim31 · July 19, 2024, 12:43pm

All right, now I’ll monitor and retest my DB if I have the slightest doubt

guim31 · July 20, 2024, 11:27am

False alarm… Gladys Plus was disconnected again this morning. I I will only be able to investigate tonight.

However, I don’t actually know what to look for.

GBoulvin · July 20, 2024, 2:19pm

Extremely radical proposal: manually stop the ZigBee2MQTT container for 24 hours…

guim31 · July 20, 2024, 3:26pm

I can consider it even though it’s quite inconvenient.
The problem: it sometimes happens that everything works for

pierre-gilles · July 22, 2024, 7:15am

Oh that’s a shame, can you

guim31 · July 23, 2024, 11:26am

Since I’m on vacation I’ll wait until I’m back to dive back into it.
In the meantime, luckily I have a VPN so I can restart my container from here!

guim31 · July 31, 2024, 2:02pm

Here is a long log file; I left it intact on purpose in case it can help with debugging:

https://pastey.nasdoury.ovh/view/9zbRDyE/raw

I see several disconnection issues, including one that isn’t followed by a reconnection.
However, I have absolutely no idea where this could be coming from.

Also, is it normal that right now, my Gladys Plus having been disconnected for about 24 hours, the RAM usage on my NUC is over 1 GB?
When I restart Gladys it drops back to around 250 MB.

I’m asking just in case!

pierre-gilles · July 31, 2024, 3:55pm

Hi @guim31, thanks for the additional info

First, I notice that all the disconnections are due to internet outages; we even see the ping to healthcheck fail:

2024-07-30T11:05:05.614060982Z 2024-07-30T13:05:05+0200 \u003cwarn\u003e index.js:914 (Socket.\u003canonymous\u003e) Socket disconnected client side. Trying to reconnect...
2024-07-30T11:05:22.432354725Z 2024-07-30T13:05:22+0200 \u003cwarn\u003e scene.executeActions.js:37 (executeAction) AxiosError: getaddrinfo EAI_AGAIN hc-ping.com
2024-07-30T11:05:22.432399450Z     at Function.AxiosError.from (/src/server/node_modules/axios/lib/core/AxiosError.js:89:14)
2024-07-30T11:05:22.432406693Z     at RedirectableRequest.handleRequestError (/src/server/node_modules/axios/lib/adapters/http.js:518:25)
2024-07-30T11:05:22.432411380Z     at RedirectableRequest.emit (node:events:529:35)
2024-07-30T11:05:22.432415768Z     at ClientRequest.eventHandlers.\u003ccomputed\u003e (/src/server/node_modules/follow-redirects/index.js:14:24)
2024-07-30T11:05:22.432420731Z     at ClientRequest.emit (node:events:517:28)
2024-07-30T11:05:22.432457075Z     at TLSSocket.socketErrorListener (node:_http_client:501:9)
2024-07-30T11:05:22.432463594Z     at TLSSocket.emit (node:events:517:28)
2024-07-30T11:05:22.432468263Z     at emitErrorNT (node:internal/streams/destroy:151:8)
2024-07-30T11:05:22.432472552Z     at emitErrorCloseNT (node:internal/streams/destroy:116:3)
2024-07-30T11:05:22.432476961Z     at processTicksAndRejections (node:internal/process/task_queues:82:21) {
2024-07-30T11:05:22.432481377Z   hostname: 'hc-ping.com',
2024-07-30T11:05:22.432485603Z   syscall: 'getaddrinfo',
2024-07-30T11:05:22.432489776Z   code: 'EAI_AGAIN',
2024-07-30T11:05:22.432493912Z   errno: -3001,

Telegram errors everywhere:

2024-07-30T11:07:03.225446331Z 2024-07-30T13:07:03+0200 \u003cwarn\u003e message.connect.js:19 (TelegramBot.\u003canonymous\u003e) Telegram polling error, code = EFATAL, message = EFATAL: Error: connect ETIMEDOUT 212.27.38.252:443

I see the errors last about 1 minute generally, and theoretically the instance reconnects afterwards:

2024-07-30T11:06:04.150166443Z 2024-07-30T13:06:04+0200 \u003cinfo\u003e index.js:884 (Socket.\u003canonymous\u003e) Gladys Gateway: connected in websockets

Except that in this case, what’s strange is that Telegram still hasn’t regained the connection after Gladys Plus reconnected

What we need to understand is what’s wrong with your network. Why these outages? And does it really come back?

Yes that’s normal: when your instance disconnects, « socket.io » (the websocket library we use) buffers all messages to send to Gladys Plus until Gladys Plus reconnects (See → Offline behavior | Socket.IO )

Which makes a bunch of messages accumulate in RAM.

pierre-gilles · July 31, 2024, 4:21pm

PS: I’m thinking it might be a good idea to set a timeout on messages sent over WebSocket to avoid them piling up in RAM — after 30 seconds there’s no point in keeping a WebSocket request ^^

But yeah, I don’t think that’s the solution to your problem

pierre-gilles · July 31, 2024, 4:33pm

Ah, while reading all the changelogs of the socket.io library, I came across some interesting stuff:

Previously, getting disconnected while waiting for an acknowledgement
would create a memory leak, as the acknowledgement was never received
and the handler would stay in memory forever.

This could explain why RAM usage increases a lot, and maybe this bug would prevent socket.io from reconnecting if you accumulate too many requests during disconnection.

I think I’ll make a note to update socket.io on the Gladys Plus front-end and on the Gladys server; that could solve this reconnection issue!

(that won’t fix your internet connection drop problems though, so if I were you I’d keep investigating anyway ^^)

guim31 · August 1, 2024, 1:11pm

Thanks @pierre-gilles for all this information.
On my side, I think I’ve found what was disrupting my network!!

I recently added a

guim31 · August 6, 2024, 7:05am

Until yesterday everything was better: no disconnections… and then, bam, my instance is disconnected again.
I restarted Gladys — disconnected again this morning.

Here are other logs that don’t seem to tell me much more, I think.

I’ll investigate this problem with Telegram which I don’t understand (but I’m not being spammed with errors like before… maybe it’s normal to have occasional connection errors?).

Right now my instance is disconnected, but the RAM is only around 600 MB, not too extreme I think.

https://pastey.nasdoury.ov

GBoulvin · August 6, 2024, 8:25am

You probably checked, but did your second DHCP server happen to reactivate by any chance?

guim31 · August 6, 2024, 8:53am

I straight-up kicked him off my network

pierre-gilles · August 6, 2024, 6:04pm

@guim31 I see lots of errors:

2024-08-06T01:12:21.980588980Z 2024-08-06T03:12:21+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.309015994Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.640530705Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:22.969901142Z 2024-08-06T03:12:22+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:23.300108257Z 2024-08-06T03:12:23+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway
2024-08-06T01:12:23.630625049Z 2024-08-06T03:12:23+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = ETELEGRAM, message = ETELEGRAM: 502 Bad Gateway

And also this:

2024-08-06T01:05:51.211667212Z 2024-08-06T03:05:51+0200 <warn> message.connect.js:19 (TelegramBot.<anonymous>) Telegram polling error, code = EFATAL, message = EFATAL: Error: read ECONNRESET

Are you sure your network issues are resolved? If Telegram is having trouble reconnecting, I think you still have issues.

Topic		Replies	Views
[RESOLU] Lenteur parfois sur dashboard Gladys Plus Configuration	48	1921	February 23, 2021
Dashboard - Evolutions / Bugs / etc Développement	141	6633	November 21, 2023
🆕 Mise à jour de l'infrastructure Gladys Plus dans la nuit du 20 au 21 Novembre! Actualités	24	1663	November 24, 2022
Gladys Assistant 4.19: Climatisations, capteur Zigbee d'humidité du sol & améliorations de fonds! Actualités	25	1317	April 10, 2023
Actions ne se chargent pas dans une grande scène Configuration	35	1213	May 16, 2022

Gladys Plus problem

Related topics