Help! Problem with aggregation?

pierre-gilles · October 29, 2021, 3:35am

@cicoub13 I can reproduce your « Maximum Call Stack » by trying to chunk an array of 150k sensor values :o that means you have a sensor that sent more than 150k values aha, we haven’t had this case even at @Terdious, interesting

pierre-gilles · October 29, 2021, 3:51am

I found a more performant implementation than the one we currently have, and it uses « slice » instead of « splice », which means you don’t have to clone the array beforehand because the array is not mutated.

I based it on this:

A performance test confirms that the old implementation is 69.6% slower than the new one, and most importantly, the new one doesn’t have the Maximum Call stack issue since the array is not cloned.

pierre-gilles · October 29, 2021, 3:59am

Here’s the PR for those interested:

https://github.com/GladysAssistant/Gladys/pull/1340/files

pierre-gilles · October 29, 2021, 4:35am

I merged the PR and launched a build on the « dev » tag to see if that fixes the issue on your end @cicoub13 (on Docker it will be available on gladysassistant/gladys:dev as usual)

The build is here:

https://github.com/GladysAssistant/Gladys/actions/runs/1397530669

It will take about an hour

cicoub13 · October 29, 2021, 5:59am

792K deviceFeatureState
5 features with 124K records - They are all linked to a connected plug (it’s this model Tuya TS0121_plug control via MQTT | Zigbee2MQTT)
5 features with 15K records - Linked to a motion detector (which also measures light)

The number of records per day is quite regular and I only have 3 months of rolling data saved.

I’ll test this today

cicoub13 · October 29, 2021, 6:07am

Here is what the API returns for this issue:

[{\"id\":\"c5bac8c2-5a49-4f7c-9ad8-0e367b427df8\",\"type\":\"hourly-device-state-aggregate\",\"status\":\"failed\",\"progress\":52,\"data\":{\"error_type\":\"purged-when-restarted\"},\"created_at\":\"2021-10-28 09:14:48.379 +00:00\",\"updated_at\":\"2021-10-29 06:03:07.825 +00:00\"},{\"id\":\"83bca935-43d3-4177-ba38-e51a16c36ff6\",\"type\":\"monthly-device-state-aggregate\",\"status\":\"success\",\"progress\":97,\"data\":{},\"created_at\":\"2021-10-28 09:04:14.845 +00:00\",\"updated_at\":\"2021-10-28 09:04:16.830 +00:00\"},{\"id\":\"6f15c4b3-5acb-4747-88b3-730af6fdecc4\",\"type\":\"daily-device-state-aggregate\",\"status\":\"success\",\"progress\":97,\"data\":{},\"created_at\":\"2021-10-28 09:04:12.809 +00:00\",\"updated_at\":\"2021-10-28 09:04:14.837 +00:00\"},{\"id\":\"2e0b5f3e-bf82-4a8c-a4c5-d601a209a10b\",\"type\":\"hourly-device-state-aggregate\",\"status\":\"failed\",\"progress\":52,\"data\":{\"error_type\":\"unknown-error\",\"error\":\"Error: RangeError: Maximum call stack size exceeded\\n at chunk (/src/server/utils/chunks.js:11:26)\\n at /src/server/lib/device/device.calculcateAggregateChildProcess.js:147:22\\n\"},\"created_at\":\"2021-10-28 09:03:22.151 +00:00\",\"updated_at\":\"2021-10-28 09:04:12.802 +00:00\"}]

cicoub13 · October 29, 2021, 6:31am

Should we modify the Zigbee2Mqtt integration to not record the value if it hasn’t changed? It would be a shame, as we could lose information (the lack of variation is information).

pierre-gilles · November 1, 2021, 1:29am

Were you able to test it in the end?

Very strange, it seems fine!

I don’t think so, it’s indeed important to keep the values even if they haven’t changed.

However, what we could do is display the « keep_history » attribute of each feature in the UI (as a toggle), to eventually allow the user to not record the history of some too verbose devices/and where the history doesn’t interest them:

github.com/GladysAssistant/Gladys

server/models/device_feature.js

master


      
          },
          external_id: {
            allowNull: false,
            unique: true,
            type: DataTypes.STRING,
          },
          category: {
            allowNull: false,
            type: DataTypes.ENUM(DEVICE_FEATURE_CATEGORIES_LIST),
          },
          type: {
            allowNull: false,
            type: DataTypes.ENUM(DEVICE_FEATURE_TYPES_LIST),
          },
          read_only: {
            allowNull: false,
            type: DataTypes.BOOLEAN,
          },
          keep_history: {
            allowNull: false,
            type: DataTypes.BOOLEAN,

cicoub13 · November 1, 2021, 10:41am

Yes, but the aggregation remains stuck at 52% (after that, I lose control of my RPI 3B+).
The CPU is at 100% and the RAM is full. And it never ends.

I tried by removing the states of the 2 verbose devices (which I don’t use for the graphs) and it works. I think you can leave your optimization in master.
I really need to invest in an RPI4 and an SSD
Thanks anyway for the investigation and the correction

I created an issue to keep the keep_history option
https://github.com/GladysAssistant/Gladys/issues/1344

pierre-gilles · November 2, 2021, 1:21am

Oh crap, indeed if you have 1GB of RAM and part of it is already taken by other things, if Gladys tries to load 130k sensor values into RAM + try to aggregate this data hour by hour, it can overload the RAM and thus it never completes…

In the UI, do you have a clear error when it’s a RAM issue?

Ok great! Otherwise, you can also tell Gladys to keep only the last 3 months of sensor values in the settings

cicoub13 · November 2, 2021, 10:45am

No, it never completes and there’s no error.

I’ll do that while waiting for the development that will prevent recording values for these two devices.

VonOx · November 2, 2021, 5:40pm

If it’s too blocking, change the keep history in db

lmilcent · November 25, 2021, 11:48am

I restarted Gladys to check, and I still have the same errors:

2021-11-25T12:46:18+0100 <warn> device.calculateAggregate.js:95 (Socket.<anonymous>) device.calculateAggregate stderr: TypeError: Cannot read property '0' of undefined
    at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26)
    at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18)
    at /src/server/node_modules/downsample/index.js:690:34
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27
    at Map.forEach (<anonymous>)
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35

2021-11-25T12:46:18+0100 <warn> device.calculateAggregate.js:101 (ChildProcess.<anonymous>) device.calculateAggregate: Exiting child process with code 1
2021-11-25T12:46:18+0100 <error> device.onHourlyDeviceAggregateEvent.js:22 (DeviceManager.onHourlyDeviceAggregateEvent) Error: TypeError: Cannot read property '0' of undefined
    at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26)
    at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18)
    at /src/server/node_modules/downsample/index.js:690:34
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27
    at Map.forEach (<anonymous>)
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35

    at ChildProcess.<anonymous> (/src/server/lib/device/device.calculateAggregate.js:102:23)
    at ChildProcess.emit (events.js:400:28)
    at maybeClose (internal/child_process.js:1058:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5)
2021-11-25T12:46:18+0100 <info> device.calculateAggregate.js:38 (DeviceManager.calculateAggregate) Calculating aggregates device feature state for interval daily
2021-11-25T12:46:58+0100 <warn> device.calculateAggregate.js:95 (Socket.<anonymous>) device.calculateAggregate stderr: TypeError: Cannot read property '0' of undefined
    at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26)
    at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18)
    at /src/server/node_modules/downsample/index.js:690:34
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27
    at Map.forEach (<anonymous>)
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35

2021-11-25T12:46:58+0100 <warn> device.calculateAggregate.js:101 (ChildProcess.<anonymous>) device.calculateAggregate: Exiting child process with code 1
2021-11-25T12:46:58+0100 <error> device.onHourlyDeviceAggregateEvent.js:27 (DeviceManager.onHourlyDeviceAggregateEvent) Error: TypeError: Cannot read property '0' of undefined
    at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26)
    at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18)
    at /src/server/node_modules/downsample/index.js:690:34
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27
    at Map.forEach (<anonymous>)
    at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35

    at ChildProcess.<anonymous> (/src/server/lib/device/device.calculateAggregate.js:102:23)
    at ChildProcess.emit (events.js:400:28)
    at maybeClose (internal/child_process.js:1058:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5)
2021-11-25T12:46:58+0100 <info> device.calculateAggregate.js:38 (DeviceManager.calculateAggregate) Calculating aggregates device feature state for interval monthly

If it can help with the investigation, I can send you my DB or perform specific analyses.

pierre-gilles · November 25, 2021, 12:08pm

@lmilcent I admit I completely forgot about this bug! I can’t reproduce it on my end, but based on your logs, it looks like a specific dataset that’s crashing the downsampling library we’re using (downsample on npm).

Could you create a GitHub issue on the Gladys repo (otherwise, in 10 minutes I’ll forget :p), and send me a DB where the issue actually occurs (that would be great for debugging )?

Thanks!

lmilcent · November 25, 2021, 4:16pm

github.com/GladysAssistant/Gladys

Aggregate: Error NaN

opened 04:16PM - 25 Nov 21 UTC

LM1LC3N7

As discussed on the forum: https://community.gladysassistant.com/t/help-probleme…-avec-lagregation/6630/33?u=lmilcent **Describe the bug** Aggreation seems not to work correctly sometimes (my graphics seems to work though): ```bash 2021-11-25T12:46:18+0100 <warn> device.calculateAggregate.js:95 (Socket.<anonymous>) device.calculateAggregate stderr: TypeError: Cannot read property '0' of undefined at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26) at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18) at /src/server/node_modules/downsample/index.js:690:34 at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27 at Map.forEach (<anonymous>) at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35 2021-11-25T12:46:18+0100 <warn> device.calculateAggregate.js:101 (ChildProcess.<anonymous>) device.calculateAggregate: Exiting child process with code 1 2021-11-25T12:46:18+0100 <error> device.onHourlyDeviceAggregateEvent.js:22 (DeviceManager.onHourlyDeviceAggregateEvent) Error: TypeError: Cannot read property '0' of undefined at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26) at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18) at /src/server/node_modules/downsample/index.js:690:34 at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27 at Map.forEach (<anonymous>) at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35 at ChildProcess.<anonymous> (/src/server/lib/device/device.calculateAggregate.js:102:23) at ChildProcess.emit (events.js:400:28) at maybeClose (internal/child_process.js:1058:16) at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5) 2021-11-25T12:46:18+0100 <info> device.calculateAggregate.js:38 (DeviceManager.calculateAggregate) Calculating aggregates device feature state for interval daily 2021-11-25T12:46:58+0100 <warn> device.calculateAggregate.js:95 (Socket.<anonymous>) device.calculateAggregate stderr: TypeError: Cannot read property '0' of undefined at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26) at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18) at /src/server/node_modules/downsample/index.js:690:34 at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27 at Map.forEach (<anonymous>) at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35 2021-11-25T12:46:58+0100 <warn> device.calculateAggregate.js:101 (ChildProcess.<anonymous>) device.calculateAggregate: Exiting child process with code 1 2021-11-25T12:46:58+0100 <error> device.onHourlyDeviceAggregateEvent.js:27 (DeviceManager.onHourlyDeviceAggregateEvent) Error: TypeError: Cannot read property '0' of undefined at calculateTriangleArea (/src/server/node_modules/downsample/index.js:69:26) at LTTBIndexesForBuckets (/src/server/node_modules/downsample/index.js:662:18) at /src/server/node_modules/downsample/index.js:690:34 at /src/server/lib/device/device.calculcateAggregateChildProcess.js:119:27 at Map.forEach (<anonymous>) at /src/server/lib/device/device.calculcateAggregateChildProcess.js:111:35 at ChildProcess.<anonymous> (/src/server/lib/device/device.calculateAggregate.js:102:23) at ChildProcess.emit (events.js:400:28) at maybeClose (internal/child_process.js:1058:16) at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5) 2021-11-25T12:46:58+0100 <info> device.calculateAggregate.js:38 (DeviceManager.calculateAggregate) Calculating aggregates device feature state for interval monthly ``` **To Reproduce** I really don't know, it could be cause by my DB (sent by email directly). **Your Gladys installation (please complete the following information):** - Raspberry Pi installation: Pi 4 **Desktop (please complete the following information):** - OS: W10 - Browser: Chrome & Firefox

pierre-gilles · November 25, 2021, 4:19pm

@lmilcent Thanks! I’ve edited the issue title. (the NaN bug is a different bug, nothing to do with it )

Albenss · December 1, 2021, 2:26pm

Same error on my end regarding aggregation. In my case, it seems to come from the data from cryptocurrencies (Shiba) which is always below 0. I wonder if it’s not due to the number of decimals taken into account in the calculation used by the library for its aggregation:

It works for the last hour as I retrieve the info every 5 minutes (so probably no aggregation) but not for the day or the month.

lmilcent · December 1, 2021, 4:33pm

Thank you for your message @Albenss, I think some aggregations may have failed due to a similar case.
For example, my MQTT device for managing presence was receiving « present » before receiving « 1 » now.

But I made the change several months ago, and there are still failures.

In general, the error should be more precise

pierre-gilles · December 3, 2021, 10:00am

@lmilcent I just looked at the database you sent me, and I’m very surprised by its content

You have device_feature_state values that are « NaN » (Not a Number). It’s very strange that this was inserted, given that the field is a « double precision », and « NaN » is not a double precision…

Of course, this crashes the aggregation that expects numerical values, not a string.

After investigation, these values come from the Zigbee2mqtt integration.

I created a GitHub issue to fix the problem:

Add a DB migration to clean up these lines that shouldn’t be there
See why the validation and the DB accepted these values that are not Numbers.

In the meantime, @lmilcent and @Albenss, the SQL query to fix your DBs is:

DELETE FROM t_device_feature_state WHERE value = 'NaN';

@lmilcent More generally, I don’t know if you keep all your historical sensor data (you keep them all), but I would advise you to clean up some very verbose deviceFeatures, because currently to perform the aggregation on my Mac, some sensors contained so many values that it took more than 1.5 Gb of RAM to perform the total aggregation (since it’s a first aggregation, it starts from the beginning. This will not be the case later)

Otherwise, by removing the NaN, it works very well for me on your DB:

lmilcent · December 3, 2021, 11:16am

I have started the removal of incorrect values. I will check the other equipment that is verbose… Function to be planned?

Topic		Replies	Views
Problème de performance sur dashboard avec beaucoup de graphiques Configuration	52	2487	October 3, 2022
Actions ne se chargent pas dans une grande scène Configuration	33	1223	March 14, 2022
Bug dashboard graphique "Grouper par" avec valeur identique Configuration	23	145	June 20, 2025
Sauvegarder ou pas l'historique d'une fonctionnalités précise Développement	18	732	May 14, 2022
Nouvelle version 4.7 : Affichage graphique et mise à jour majeure de l'intégration Zigbee2mqtt! Actualités	47	3964	April 8, 2022

Help! Problem with aggregation?

Related topics