Improved recalculation of energy tracking

@pierre-gilles ,

I’m following up on this topic — I’ve finished the verification for the PR that improves the energy tracking recalculation. I think you can review:

This will be followed by the next PR which adds clearer logging for the user, notably to see where the last launched tasks were at if a Gladys restart or an issue occurred during a recalculation (avoiding having to redo the entire recalculation …):

Could you describe in more detail what this PR does?

Since this touches the core, these are fairly sensitive changes.

I’d like to understand precisely the scope to be able to properly assess the impact.
During development I spent several months improving the reliability of the calculations, and it’s vital to me that the calculation isn’t affected :slight_smile:

You’re right to ask for the details, I completely understand. And if possible, I’d prefer that you also test it by running it alongside your production environment to properly verify. That’s what I did to properly assess correct operation. I didn’t find any scenarios where it crashes. But for example, I don’t have any Zigbee energy. Even though there’s no reason it should be different, that would cover everything.

The overall diff is large (+3203 / -428), but the functional scope I’m targeting in this PR is mainly the following:

  1. Targeted recalculation (backend) over a date range / feature selection
  • Added recalculation by date range for:
    • cost (calculate-cost-range)
    • consumption from index (calculate-consumption-from-index-range)
  • The “from beginning” endpoints now also accept a feature selection (feature_selectors).
  • Validation of YYYY-MM-DD dates on the controller side (with BadParameters if the format is invalid).
  1. Partial recalculation without breaking the rest
  • When I recalculate a period, I only purge the states within the period (destroyStatesBetween) before recomputation.
  • When there is no end date, I keep the “from start → now” behavior (destroyStatesFrom).
  • For consumption from index, I restore ENERGY_INDEX_LAST_PROCESSED at the end of processing on partial/selected recalculations to avoid damaging the global cursor.
  1. Execution as a background task
  • Switched to wrapperDetached for long recalculations (from beginning / range), to immediately return a job_id.
  • The frontend only considers the action successful if a job_id is returned.
  1. Frontend (energy monitoring screen)
  • Added start date / end date fields (each field is optional).
  • Added multiple selection of features to recalculate.
  • If no feature is selected, explicit confirmation before a global recalculation.
  • Automatically call the range endpoints if a date is provided, otherwise the from-beginning endpoints.
  1. What I did not touch in the calculation core
  • I did not change the contract calculation formulas (the engine contracts.calculateCost remains the same).
  • I did not change the basic logic of the index delta (the formula remains identical); I mainly constrained the scope (selectors, dates, targeted purge, job handling).

I have:

  • neither changed the business formulas of contract calculation
  • nor the principle of the index delta
  • nor the unit conversion we recently put in place.

I have « only » modified the input scope (selected features, date range) and the technical framing of the recalculations (targeted purge, jobs, API/front).

And to tell you the truth, at the start of the week I almost closed the PRs for this topic and wrote to tell you that I was leaving it in your hands.

I was afraid it would take you longer to review it than to do it yourself. That on top of that visually it might not suit you. And that as a result you’d probably be better off doing it the way you think.

But I’m torn between that and the fact that there are still a few of us waiting. And the fact that I can see you’re busy with many other things.

So I decided to take the approach of proposing it and leaving it up to you. And you’ll give me your feedback / impressions. :wink:

Thanks for the details, I’ll let you know when I’ve had a chance to take a look :wink:

I had a quick look at the PR and it seems to touch more things than just the date re-calculation.

For example, the first four files appear to be leftover from another PR:

For the business code, the PR is quite large and touches so many elements that it’s hard to be sure it doesn’t change the calculation algorithm.

There’s even a new logic introduced with new concepts (like shouldRestoreLastProcessed), so it’s clearly not a trivial PR.

Yet intuitively, if I think about the idea for two minutes, I struggle to understand why it’s so complex.

I wonder, unfortunately, if your intuition is right:

It’s really a shame, because I actually agree with the result and the UX seems good to me. The problem is that the code is huge: I can’t merge without reading it entirely, understanding the new concepts, and running all the necessary tests… :sweat_smile:

What do you think?

Hello @pierre-gilles

Thanks for starting to look.

The PR branch started from the Tasmota branch, which has had quite a few changes following our discussions.

  • After updating the PR once Tasmota was merged, I went over it again, but I missed the file ‹ front/src/components/device/index.js › which still includes the creation of device conso/cost => I don’t understand how I missed it; it’s the first file and it’s a big one. It’s removed

  • front/src/components/device/UpdateDevice.jsx and front/src/components/device/UpdateDeviceFeature.jsx were adding a shortcut from features that had associated consos/cost to go directly to the energy tracking page. That can be proposed later; I removed it.

  • front/src/components/drag-and-drop/DeviceListWithDragAndDrop.jsx: the component is used in the selection of devices to be recalculated. There was a need to add an inability to change the name for this use.

I agree, this is not a trivial PR on that point.

Originally, I considered duplicating the logic into a separate recalculation engine to avoid depending on ENERGY_INDEX_LAST_PROCESSED.
But I deliberately avoided that: we’d have had two parallel implementations (incremental vs recalculation), so more complexity with a risk of divergence over time.

Today I added calculateConsumptionFromIndexRange as an orchestrator (date bounds, cleanup, window loop).
I wanted to keep the actual calculation of each window by always going through calculateConsumptionFromIndex, which itself uses ENERGY_INDEX_LAST_PROCESSED.

So, shouldRestoreLastProcessed is used to protect the live incremental cursor during partial/retrospective recalculations. Because before, the calculation always went « to the end ». But here the goal is to be able to add an old contract price and launch the recalculation on that range only. If that range was a year ago, without this the next 30-minute calculation would have resumed over a complete year instead of restarting at the last real tracking run.

Concrete example

  • It’s 07/03 at 09:25.
  • Manual recalculation of a device for 01/03 → 06/03.
  • The automatic 09:30 job is queued (queue at 1).

Without restoration:

  • the cursor can remain at an old value (e.g., 06/03 23:30),
  • then the 09:30 job can aggregate too broadly up to 09:30.

With restoration:

  • we restore at the end of the run the cursor’s initial value,
  • the 09:30 job restarts on a coherent base.

Because this cursor is used by the index consumption calculation.

In short, it’s a safeguard to keep a single maintainable engine, without side effects on subsequent incremental runs.

Of course I may be wrong about the logic, but in particular on this point I worked with the AI and tried to turn it every which way to be sure it’s a good way to do it (not necessarily the best; that’s what the review is for!)

I’m well aware of that, and I told you from the start of this PR. It touches a sensitive element, and I was hoping that you could run it on an instance next to your prod to be sure that the calculations (which have not been changed) are correct even after range recalculations, etc.
For the code, for your information these are the largest tests:

Have a good weekend and a good run :wink:

Hi @pierre-gilles,

Not knowing if you’re in class or when you’ll get to reviewing this PR, I thought it would be good to relaunch a full review by @coderabbitai because I notice he hasn’t gone over it again after a few passes.

And given the number of commits made since his first review, I thought it would be good to do a full one on the complete PR: https://github.com/GladysAssistant/Gladys/pull/2413#pullrequestreview-3943430667

I think he raised some good points. If you haven’t started your review, may I push the changes? Otherwise, what do you prefer?

I haven’t looked yet, you can modify the PR :slight_smile:

Ok, it’s done.

Hi @pierre-gilles,

I’m getting back to you about PR #2413 (energy recalculation).

Still waiting for this recalculation to fully use the energy tracking integration, I’ve cleaned up the PR (merged master, CodeRabbit fixes, retested on my prod for nearly 2 months with no calculation discrepancies with the official version). I’ve reopened it.

As a reminder from the discussion thread: you mentioned at the beginning of April that you preferred to redo the subject yourself because the PR seemed too large and too sensitive. I had agreed and put the PR on hold.
To move forward concretely, I propose to split the PR into several small independent PRs, mergeable one by one. You could review in small pieces, test each step on your prod, and easily roll back if needed:

  • PR1: destroyStatesBetween + its tests (pure utility, isolated, ~100-200 lines).
  • PR2: recalculation by feature selection from the beginning (without the date range).
  • PR3: recalculation by date range (builds on PR1 and PR2).
  • PR4: UI adjustments.

You could merge PR1 and PR2 without committing to the most sensitive concepts (shouldRestoreLastProcessed only appears in PR3).

If you validate this breakdown, I’ll get started this week. If you prefer to take it over yourself with a clear timeline, let me know and I’ll close the PR. What bothers me is the lack of decision and not being able to fully use this integration; I think other users were also affected.

Thanks for your feedback.

Hi,

The main issue here is one of trust :slightly_smiling_face:

What concerns me about the PR in its current state is that it fundamentally changes the calculation logic. Last year, this part took me several months to make reliable for the three types of contracts managed (base, peak/off-peak hours, and Tempo), with extensive testing on real data from dozens of users.

Normally, unit tests are meant to detect regressions. But today, with AI, the tests themselves can be massively modified automatically, which is the case here. As a result, I can no longer consider the tests a reliable guarantee. The AI could have easily introduced regressions and then adapted the tests to validate them.

To merge this PR as is, I would therefore need to review and understand the entire modified code, then go through the entire validation phase I did last year. And unfortunately, I no longer have the test data sets that some users had sent me at the time. So, it’s a project almost equivalent to the initial development, not just a simple iteration.

On the other hand, if you propose a « front only » PR that simply adds the ability to recalculate for the last month / last 3 months / last 6 months / last 12 months, it could probably be merged in less than an hour, without a heavy validation phase, since the business logic would not be modified.

Breaking it down into small PRs can help make the changes more readable, but as soon as we touch the business logic here, there will inevitably be a significant, non-compressible validation phase.

What do you think?

And sorry if this seems harsh, it’s really not against your work. It’s just that, with my current schedule, projects of this magnitude are very difficult to absorb :sweat_smile: I would like to have more time for this kind of in-depth review, but the project isn’t at that stage yet.

@pierre-gilles,

First of all, thank you for this response, which has the merit of being complete and honest about your constraints.

To be able to answer you fully, and not just « I’m sure of my code », I took the time to conduct a rigorous audit of the PR this afternoon. I framed the request with an explicit instruction: to verify that each change is necessary, solid, well thought out, without regression, and above all that the tests are not distorted — including by deliberately breaking the code to verify that the tests detect regressions (real mutation testing). This is precisely the legitimate doubt you raise about AI and tests, and I wanted to provide a verifiable answer, not just a statement.

I am providing you below with the result of this audit, factual and reproducible on your side.

1. On your central objection: « the PR deeply modifies the calculation logic »

I maintain my previous answer, but this time with line-by-line verification. The two files where the business logic lives are:

  • server/services/energy-monitoring/lib/energy-monitoring.calculateConsumptionFromIndex.js
  • server/services/energy-monitoring/lib/energy-monitoring.calculateCostFrom.js

I have read every line modified. What is strictly unchanged:

  • convertEnergyUnit(...) — unit conversions
  • contracts[contract](...) — all cost formulas (base, HP/HC, Tempo)
  • Index delta calculation and counter reset management
  • Price filtering by date + EDF Tempo + subtract(30, 'minutes')
  • saveMultipleHistoricalStates / saveHistoricalState

The only additions in calculateConsumptionFromIndex.js are encapsulated in an if (selectorSet.size > 0) (whitelist filtering): disabled in the absence of a whitelist, so legacy behavior is strictly preserved for the live incremental calculation that runs every 30 minutes.

In calculateCostFrom.js, the additions are:

  • Parsing of start/end dates (new entry for range mode)
  • Filtering by selectors (encapsulated in if (selectorSet.size > 0))
  • destroyStatesBetween if endAt provided, otherwise destroyStatesFrom (legacy behavior)
  • A try/catch per cost feature (robustness: a feature that crashes does not block the others)

The cost formula itself — the one that took you months to make reliable on base/HP/HC/Tempo — has not moved a character. This is verifiable in a few minutes with git diff master HEAD -- server/services/energy-monitoring/lib/energy-monitoring.calculateCostFrom.js and a Ctrl+F on contracts[contract], convertEnergyUnit, energyPricesForDate.

2. On your objection about the tests: « AI can introduce regressions and adapt the tests to validate them »

This is a perfectly legitimate concern, and that’s precisely why I did real mutation testing: I temporarily broke 3 critical points in the code and reran the entire suite. The goal is to verify that the tests detect the regression, not that they pass with the regression.

Here are the 3 mutations tested and the result:

Applied Mutation Tests that Break
Date validation removed from the controller reject invalid start date format + reject invalid end date format (2 tests)
Whitelist filter of the live engine calculateConsumptionFromIndex removed skip consumption features not in whitelist selectors (1 test)
finally block for restoring the cursor ENERGY_INDEX_LAST_PROCESSED removed restore last processed value on selector-based recalculation + restore last processed value on window error (2 tests)

Total: 5 tests break precisely on the right assertions. The tests are not AI green-washing, they really detect regressions on critical protections. Reproducible on your side: just comment out these 3 blocks and rerun npm run test-service --service=energy-monitoring. Files restored after verification, I confirmed with git status that there are no residual modifications.

I also verified the non-regression of existing tests:

  • The modified legacy tests (on calculateConsumptionFromIndex.test.js) have only an addition of undefined as the second parameter (to align with the new signature). No assertion has been weakened or removed.
  • calculateCostFromYesterday.test.js is actually strengthened: before it only checked calledOnce, now it checks the exact signature of the passed arguments.
  • Controller Tests: the pattern try/catch + expect.fail is replaced by next(error) — this is the correct Express pattern.

3. On your proposal « front-only PR with presets 1/3/6/12 months »

I understand the logic: to avoid the heavy validation phase on the business by not touching the backend. But this proposal does not cover the real needs of users, and it’s important that I explain this:

Use Case #1 — Historical recalculation on existing devices added to tracking a posteriori

This is the promise of energy tracking in Gladys. Many users (including me) have Tasmota or Zigbee2mqtt outlets for several years in their Gladys instance, sometimes with 4 years of index data stored. When we add the energy tracking integration to these devices, we must be able to recalculate the consumption and cost over all available history, and not just the last 12 months.

Today, the only way is the global from-beginning, which:

  • recalculates for all devices at the same time (and not just the one just added)
  • does not complete on substantial installations (case I am experiencing and others report)
  • blocks the HTTP request on the front end for the entire duration of the calculation (probably the silent regression that makes us say « it doesn’t work » for large installs)

A preset « last 12 months » does not solve any of these three problems. A selection by feature + date range solves all three.

Use Case #2 — Retroactive tariff

A user who realizes that they changed their tariff 18 months ago and wants to recalculate only that range is blocked with a 12-month preset.

Use Case #3 — Granularity

When you have 30 devices tracked and you just want to correct data on a single device for a week, launching a global recalculation on 1/3/6/12 months is disproportionate and risky for the other devices.

4. On the robustness improvements brought by the PR

I also want to mention two points where the PR makes the code safer than master:

  • wrapperDetached (new job primitive) decouples the HTTP request from the long calculation. Before, the from-beginning blocked the HTTP request until the end of the calculation, which caused timeouts on large installs. This is probably the silent cause of the « it doesn’t work » that we regularly report.
  • The try/finally around the restoration of the cursor ENERGY_INDEX_LAST_PROCESSED protects the instance from a corrupted state if a window fails during recalculation. On master, if a window crashes, the cursor remains destroyed and the live incremental calculation starts from zero at the next iteration.

5. Honest points that I will correct before push

To not tell you that everything is perfect, the audit also raised 3 minor cleanups (non-blocking, no risk of regression):

  1. A backward-compat fragment in calculateCostFrom that is no longer useful (all internal callers already pass the new signature).
  2. A ~70% duplication between FromBeginning and Range that I can factorize.
  3. 2-3 tests « passing by chance » in calculateCostFrom.test.js that do not really test what they claim (to be hardened or removed — the real business tests remain in the original unmodified tests).

I commit to correcting these 3 points before the split.

6. My concrete proposal

Given the above, I still propose the split into sub-PRs, but by accommodating your concern about the « uncompressible validation phase »:

  • PR1: destroyStatesBetween + wrapperDetached + their tests. ~150-200 lines. No business logic, just utilities. Can be merged quickly, low-risk.
  • PR2: Recalculation by feature selection from the start (without date range). The whitelist filter is encapsulated in if (selectorSet.size > 0), so the legacy behavior is strictly preserved.
  • PR3: Recalculation by date range + shouldRestoreLastProcessed. This is where the « real » business review is focused.
  • PR4: UI adjustments.

On PR3 — the one that requires the most validation from you — I’m ready to:

  • Provide reproducible test datasets from my production (anonymized SQL output) that you can replay at your end
  • Add code documentation on shouldRestoreLastProcessed
  • Give you a video demonstration of the before/after behavior if it helps

If you validate this breakdown, I’ll start with PR1 this week. If you prefer I stay on hold, let me know with a timeline — even a broad one — and I’ll close the PR again calmly.

Thanks for your time, and sorry if the response is dense — I preferred to give you facts rather than assertions.

@pierre-gilles, additional to my previous message.

To concretely support what I wrote about the audit, I’ve made the cleanups I announced.

Cleanup applied to calculateCostFrom.test.js

By auditing the tests added by this PR on this file (the original tests on master were not touched), I found 5 problematic tests out of 6:

  • 2 tests removed outright:
    • a duplicate with a test already present on master (should handle case where no energy price is found for the device at the given date)
    • a test « should return null when start date is invalid type » that passed for the wrong reason (early exit because there was no device in the DB, not because it was actually testing the invalid type). The controller already filters this case upstream, so testing this fallback at a level where it never happens in production = over-testing.
  • 3 tests rewritten with the real DB pattern (device.create() + db.duckDbBatchInsertState() + energyPrice.create()), as all the original tests in the file on master do. The previous versions used stubs otherwise, which made the tests less reliable and inconsistent with the rest of the file.

A point on coverage

During the audit, I also discovered that the test should handle case where no energy price is found for the device at the given date which is on master does not actually cover the branch it claims to test. Technical reason: its energy_parent_id on the cost feature points to the index feature instead of the consumption feature, so the matching fails earlier in the code and the branch if (energyPricesForDate.length === 0) is never reached. This branch was actually only covered by one of my stubbed tests that I wanted to remove.

I therefore added a clean test in real DB to preserve this coverage. I did not modify the master test that has the bug — I forbade myself to touch the legacy.

Balanced score

  • 167 → 166 tests passing (-1 net: -2 removals, +1 addition for the orphaned branch)
  • Line coverage maintained at 99.61 % (codecov target 98.80 %)
  • Diff vs master: -49 lines on this file
  • Test pattern unified: 100 % real DB on additions as well as on legacy
  • No original master test modified

I did a review of PR 1!

My feedback: core: Energy monitoring - add destroyStatesBetween (device) and wrapperDetached (job) utilities - PR1 by Terdious · Pull Request #2528 · GladysAssistant/Gladys · GitHub

Thanks to you,

I replied and fixed it.

I also opened PR2 (adding recalculations on features only, without date ranges that will come in PR3) for reference.

I haven’t retested everything yet. I’ll do that tomorrow evening.

In any case, at my level, I’m quite impressed by the rigor of Terdious in describing what has been prepared. I hope you find a way together to bring this evolution to production, because it will really be useful to be able to perform these targeted recalculations.

Hi @StephaneB,

Thanks for your message, it means a lot to me.

However, out of honesty and transparency towards Pierre-Gilles, I feel I should clarify one point:

I’m not a professional developer, I’m self-taught. I have clear ideas and usually manage to code what I need, but when it comes to architectural rigor, I rely heavily on feedback from reviews to improve — and it’s completely normal that this requires more cycles for Pierre-Gilles with me than with other contributors.

In fact, as early as PR1, I had chosen to duplicate two functions rather than extend the existing ones — out of excessive caution regarding Gladys’s core, which I know is sensitive. It turned out to be the wrong choice in this context, and Pierre-Gilles was right to point that out to me. I’ve refactored it in the right direction.

So yes, I want to drive this development because I believe it serves a real shared need — but without claiming qualities I don’t have. The seriousness I can bring to the description and testing comes mostly from the time I spend on it and the tools I use, not from a professional developer level.

Thanks anyway for your support.

Hi @pierre-gilles,
For info, I posted a test feedback on PR1 (#2528). I ran a Docker image terdious/gladys:energy-monitoring-pr1 on an isolated copy of my prod DB since last night, launched a full cost recalculation (1h30, clean logs), and compared the recalculated values to my prod month by month and year by year — it matches.

I can let it run for one or two more days to validate the incremental over time, or move directly to a PR2 test image if you prefer. Let me know.

Thanks a lot, I’ll let you know as soon as I’ve had a chance to look!