Looking for DuckDB query optim

Terdious · December 14, 2025, 9:44am

Following the discussions about RAM locking by DuckDB, it would be appropriate to check whether we can optimize the heaviest queries.

~~Pagination is one of the options to consider.~~

Below is an example of one of my heaviest pages for energy monitoring, each chart containing 8 series (which may be increased elsewhere) and everything being detailed for each phase (L1/L2/L3) + the total.

Time-based visualizations up to 3 months are smooth. But the yearly view is slower, around 10s.

Initial query when displaying the dashboard with charts for the last hour:

Start of loading when switching to year on the 1st chart

End of loading

Query details

Temporal detail of the query

Following the 4.66.3 update, I confirm that I now take « only » 1s more for the yearly view and no difference is felt for the rest:

@pierre-gilles, feel free to tell me if you want more details

pierre-gilles · December 14, 2025, 9:51am

No need for pagination, we do sampling, we only return at most 300 rows per chart

I’ll take a look with this information, thanks for the detailed post!

pierre-gilles · December 14, 2025, 10:05am

Can you run tests on your database?

I’d like to see the result of this query:

SELECT
    TIME_BUCKET(INTERVAL ? MINUTES, created_at) AS created_at,
    AVG(value) AS value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    SUM(value) AS sum_value,
    COUNT(value) AS count_value
FROM
    t_device_feature_state
WHERE device_feature_id = ?
AND created_at \u003e ?
GROUP BY 1
ORDER BY created_at;

You need to replace the ?:

Replace the first with 1752
Replace the second with the most complete device_feature_id you have
Replace the third with the date from one year ago, i.e. '2024-12-14T10:04:11.478Z'

If you want to compare, here’s the current query:

  WITH intervals AS (
        SELECT
            created_at,
            value,
            NTILE(300) OVER (ORDER BY created_at) AS interval
        FROM
            t_device_feature_state
        WHERE device_feature_id = ?
        AND created_at \u003e ?
    )
    SELECT
        MIN(created_at) AS created_at,
        AVG(value) AS value,
        MAX(value) AS max_value,
        MIN(value) AS min_value,
        SUM(value) AS sum_value,
        COUNT(value) AS count_value
    FROM
        intervals
    GROUP BY
        interval
    ORDER BY
        created_at;

Terdious · December 14, 2025, 10:59am

One question, why 1752? That makes buckets of 29h12??

Otherwise, here are the results:

Aggregation by time bucket - TimeBucket Query

SELECT
    TIME_BUCKET(INTERVAL 1752 MINUTES, created_at) AS created_at,
    AVG(value) AS value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    SUM(value) AS sum_value,
    COUNT(value) AS count_value
FROM
    t_device_feature_state
WHERE
    device_feature_id = 'e9079170-4654-4fad-b726-23a36c793905'
    AND created_at \u003e TIMESTAMP '2024-12-14 10:04:11.478'
GROUP BY 1
ORDER BY created_at;

Result

301 rows returned
Execution time ≈ 0.486 s
Fixed time buckets (~29h)

Quantile partitioning - NTILE(300) Query

WITH intervals AS (
    SELECT
        created_at,
        value,
        NTILE(300) OVER (ORDER BY created_at) AS interval
    FROM
        t_device_feature_state
    WHERE
        device_feature_id = 'e9079170-4654-4fad-b726-23a36c793905'
        AND created_at \u003e TIMESTAMP '2024-12-14 10:04:11.478'
)
SELECT
    MIN(created_at) AS created_at,
    AVG(value) AS value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    SUM(value) AS sum_value,
    COUNT(value) AS count_value
FROM
    intervals
GROUP BY
    interval
ORDER BY
    created_at;

Result

300 rows returned
Execution time ≈ 0.370 s
Intervals balanced by number of points

pierre-gilles · December 14, 2025, 11:05am

Exactly! It’s 525600/300

However, the performance isn’t better, and both are actually pretty good performance-wise…

Also, in your graph, you have 10 features, so you have to multiply by 10, which explains your performance.

The solution might not be there, but maybe in a single common query for the 10 features

Terdious · December 14, 2025, 11:43am

Well, I just ran some tests… And from that I would say that the problem is most likely not the queries to DuckDB (I believe we had already run this kind of tests directly in dev in Gladys ^^). It must simply be the rendering or rather the construction of the series… :

Aggregation by time bucket - TimeBucket Query

SELECT
    device_feature_id,
    TIME_BUCKET(INTERVAL 1752 MINUTES, created_at) AS created_at,
    AVG(value) AS value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    SUM(value) AS sum_value,
    COUNT(value) AS count_value
FROM t_device_feature_state
WHERE device_feature_id IN (
	'eaa982b7-a3c8-4b4d-b2d0-5a5cf5245938',
    'e9079170-4654-4fad-b726-23a36c793905',
    'c59fca37-54f3-418b-b27c-857e08e2a0fb',
    'da79df66-dcff-438e-859c-39b641fd15e9',
    '5eba10ac-4c3a-479c-aba7-a1e74a00d914',
    '4b5e90f0-a86b-405b-8ca1-953705ef79a3',
    'c92f8fa3-d841-4692-bd3a-e5addb760a73',
    'b5445f10-7aa9-41bf-8cae-d2f30498652f',
    '08e90478-4bb2-4a08-986f-3d6f212c95f1',
    '85311f8c-334f-4b7e-8294-2b841c0b8fe8'
)
AND created_at > TIMESTAMP '2024-12-14 10:04:11.478'
GROUP BY
    device_feature_id,
    created_at
ORDER BY
    device_feature_id,
    created_at;

Partitioning by quantiles - NTILE(300) Query

WITH intervals AS (
    SELECT
        device_feature_id,
        created_at,
        value,
        NTILE(300) OVER (
            PARTITION BY device_feature_id
            ORDER BY created_at
        ) AS interval
    FROM t_device_feature_state
    WHERE device_feature_id IN (
		'eaa982b7-a3c8-4b4d-b2d0-5a5cf5245938',
	    'e9079170-4654-4fad-b726-23a36c793905',
	    'c59fca37-54f3-418b-b27c-857e08e2a0fb',
	    'da79df66-dcff-438e-859c-39b641fd15e9',
	    '5eba10ac-4c3a-479c-aba7-a1e74a00d914',
	    '4b5e90f0-a86b-405b-8ca1-953705ef79a3',
	    'c92f8fa3-d841-4692-bd3a-e5addb760a73',
	    'b5445f10-7aa9-41bf-8cae-d2f30498652f',
	    '08e90478-4bb2-4a08-986f-3d6f212c95f1',
	    '85311f8c-334f-4b7e-8294-2b841c0b8fe8'
    )
    AND created_at > TIMESTAMP '2024-12-14 10:04:11.478'
)
SELECT
    device_feature_id,
    MIN(created_at) AS created_at,
    AVG(value) AS value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    SUM(value) AS sum_value,
    COUNT(value) AS count_value
FROM intervals
GROUP BY
    device_feature_id,
    interval
ORDER BY
    device_feature_id,
    created_at;

So I re-ran a performance test on the dashboard:

ChatGPT Study ^^ :

Direct reading of your Performance recording

Total duration

Total ≈ 13,190 ms (13.2 s)

Breakdown (left panel)

Rendering (Affichage): 2,330 ms
Paint / Layout (Rendu): 791 ms
Script (JS): 563 ms
Network loading (Chargement réseau): 27 ms
System / unassigned (Système / non attribué): ~5,200 ms
Total “useful” visible: ~4 s
Big gap “unassigned / idle / waiting”: ~9 s

Immediate conclusion:
it’s neither the network,
nor DuckDB,
nor pure JavaScript,
it’s mostly waiting time before the useful rendering starts.

Focus on the Network part (key for Gladys)

In your table on the right:

Request: aggregated_states (10.5.0.227)
Transferred size: 213 KB
Execution time: ~260 ms

That confirms what we suspected:

the backend query is fast
the payload is reasonable
< 300 ms, so not the problem

We can now say without ambiguity:

Chrome Performance measurements show that:

the backend query is < 300 ms

the payload is ~213 KB

pure JS < 600 ms

The ~13 s experienced are mainly due to a waiting phase + front-end chart rendering.
Optimization should therefore be done mostly on the React / chart rendering pipeline, not on DuckDB.

So where do the ~9 seconds go?

According to the flame chart and timeline:

“Waiting / unassigned” phase (~9 s)

It’s typically:

waiting for a complete React state
waiting for all series to be ready
synchronization before mounting the chart
or logic like:
- “waiting for all promises”
- “waiting for X to be computed before rendering”

Classic anti-pattern on complex dashboards.

“Rendering + Layout” phase (~3 s)

Here, we are clearly seeing:

many points
many series
probably:
- recalculation of scales
- recalculation of legends
- tooltips
- min/max
- animations

This is consistent with an annual multi-curve graph.

Conclusion (which you can already state)

You can already say, with evidence:

Chrome Performance measurements show that:

the backend query is < 300 ms

the payload is ~213 KB

pure JS < 600 ms

The ~13 s experienced are mainly due to a waiting phase + front-end chart rendering.
Optimization should therefore be done mostly on the React / chart rendering pipeline, not on DuckDB.

Next concrete steps (if you want to go further)

Without even seeing the code, the obvious levers are:

render the chart progressively (series one by one)
limit the number of points before rendering (hard cap)
disable animations/transitions on long views
virtualize / pure canvas if not already the case

pierre-gilles · December 14, 2025, 11:53am

I don’t agree with ChatGPT, we can clearly see in your screenshots that it’s the backend response that takes 11 seconds to reply:

Which is not surprising, you showed me that one request took 400-500ms, so 500 ms * 10 = 5 seconds at best

Exactly, you prove my point: for now, we’re not making one request with 10 IDs, we’re making 10 separate requests!

Terdious · December 14, 2025, 11:58am

Ok, I understand indeed.
Here is the result for the 10 separate requests:

pierre-gilles · December 14, 2025, 1:04pm

That actually proves the point, since you have to take into account that when you display your dashboard, you don’t have just this widget, but potentially 5 other widgets, so we’re more like 50 requests

Terdious · December 14, 2025, 1:08pm

Yes yes, I agree with what you’re saying!!

Mmmmh I don’t agree with that, because the yearly display is sent after the first load (hourly display on page refresh). When I select the yearly graph, I only activate that one. We can also clearly see the other ‹ live › requests being added afterwards and queued.

But I’m ok to say that currently we’re more in the order of 3 to 5s waiting for DuckDB requests.

Topic		Replies	Views
Graphiques et fuseaux horaires Configuration	14	534	February 14, 2022
Affichage des courbes de capteurs sur le dashboard [Archive] Demande de fonctionnalités	112	5945	December 6, 2021
Performances gladys sur RPI Présentations	2	806	December 30, 2018
[RESOLU] Lenteur parfois sur dashboard Gladys Plus Configuration	48	1921	February 23, 2021
Utilité d'afficher un graphique sur plus de 24h Configuration	15	405	December 7, 2023