AI-written message describing movement detected by a camera

Feature description :
Permettre à Gladys d’envoyer un message écrit par une IA qui décrit le mouvement détecté d’une caméra

Evolution de l’intégration existante ChatGPT ?

Suite au post Message écrit par une IA qui décrit le mouvement détecté d'une caméra, je créer cette demande de fonctionnalité afin que les personnes intéressés puissent voter

Hi everyone :slight_smile:

This is a topic I’ve wanted to do for a long time, especially the « Ask the AI a question from scenes » part (even without an image) to be able to receive proactive notifications from the AI (futuristic!)

Since I was working on the ChatGPT part this morning I developed this feature, which wasn’t very complicated with the new vision API of GPT-4o mini :slight_smile:

Ask the AI to describe a camera image

In a scene, it becomes possible to ask the AI to react to a camera image:

Which then sends me a notification to my phone; for example, here there was nothing on the camera:

Ask the AI to take an action based on a camera

If we go further, it’s possible to take the garage camera and react based on the car that just arrived:

I took an image from the internet of a garage with this exact car, and I get:

Now, let’s change the car description:

This time, the AI understands what’s happening and warns me:

This is pretty crazy!!!

The possibilities are endless

This feature is quite futuristic, and since it’s possible to inject variables into the message, it enables a lot of things

  • Analyze the day’s weather report and write me a short summary sent to my phone every morning at 8 AM
  • If the alarm goes off at my house, create a message that summarizes what’s happening: sensor states, image analysis, etc.. — all combined into a single message!
  • Analysis of any image, whether the image comes from a camera at my home or from an online stream — everything is possible
  • and many more!!

What do you think?

Edit: Posted on Twitter here

11 Likes

Just one word: Can’t wait to try it!! (oh well no, that makes four ^^)

It’s a dream!! Could there perhaps already be the beginnings of a Jarvis?? ^^ Not much left to make SonosJarvis speak, right?? ^^

1 Like

Ah well, we’re right in the middle of Jarvis :grin:

Edit: it’s quite easy to add an option for a voice response on a speaker instead of a message, we have all the building blocks :slight_smile:

1 Like

Incredible! The possibilities with this kind of integration are enormous, I think!!

2 Likes

I love the concept, because it would allow me at home to do without some of

2 Likes

All that’s missing are triggers for a camera-related event (a car entering the field of view then…)

1 Like

Hello everyone :slight_smile:

I continued working on this topic this morning, and I think I’ve finished.

Here is a Docker build with the feature:

gladysassistant/gladys:ask-ai-in-scene

@Terdious (and others) if you want to test it and tell me what you think!

How to use it?

In scenes, there is a new action « Ask the AI ».

This action will ask a question to Gladys AI (ChatGPT) as if you were sending a message to Gladys, and you can attach a camera image that will be analyzed by the AI (this is optional).

In your message, you can inject sensor values:

And request actions from the list of 7 actions currently supported by Gladys, for example:

For camera images, you can ask anything:

Note: To keep processing time as fast as possible, and to keep this feature economically affordable, images are sent to the OpenAI API in « low » resolution, i.e. 512px maximum.

If some information is very small, it won’t be visible to the AI, but honestly at 512px you can already see quite a lot!

2 Likes

Great, really looking forward to some test feedback!!!

1 Like

Hi @pierre-gilles,

I was able to test a bit this morning. It looks very interesting even if I couldn’t test it fully. For example, depending on the camera placement, I’m not convinced that at 512px it can read a license plate in my setup ^^
Here is my context and the interesting answers it gave me, unfortunately even with it set to 10s I couldn’t get a car centered in the image (by the way tell me if I owe you money because it ran like that for about ten minutes) so I couldn’t test whether it actually turned off the light:


Given that it responds to the bottom image, I’m impressed that it can spot a Skoda at that distance.
You can see that « Show me the barrier camera » has no effect — I have to send it as a separate action.

One thing is certain: this will motivate me to integrate the camera into Netatmo and even more so the webhooks.

Feedback:

  • Integrate retrieval of switch states like for lights so there is no need to run the « Retrieve … » action and inject variables
  • Implement actions on switches the way they are done for lights
  • Support retrieval of all types of functionality
  • Support multiple retrievals (example: « Check the status of all my temperature sensors and give me a full report »)

But honestly, otherwise it works very well and it’s very fast.

4 Likes

Hello! What are the events linked to the camera? :slight_smile:
Can’t wait for camera integration

Thanks for testing!

It’s a bit too off to the side indeed, if you move the camera or the car I think it could work :wink:

No, don’t worry, it’s fine :slight_smile:

That’s really impressive!

The current development only handles the 7 actions that are in the documentation, and has no knowledge of your connected home.

So asking « if the office is on », or « Retrieve the state of the hay purifier switch », doesn’t work — that’s a ChatGPT hallucination.

I agree that it would be cool if that were possible! I think we just need to give the AI a view of the state of the entire connected home at the time of the request.

I also see that you request multiple actions in your block — currently that’s not possible. Right now, 1 request = 1 action/response. If you ask for several things, the AI will determine the most important action for it.

But we can implement multi-action without any problem.

It’s clear that for you, who uses Netatmo a lot, the combo will be nice :stuck_out_tongue:

1 Like

Hi @spenceur,

You can visit the Netatmo dev site: Netatmo Connect | Security API Documentation.

Search for « webhooks » in there or « List of events ».
@pierre-gilles already created the webhook for us in Gladys a few years ago now. I think the longest part now is finishing the PR for the cameras. All that’s left to do ^^

Hi @Terdious :slight_smile:

I’ve updated the image with what we discussed: the ability for the AI to know everything about the state of the sensors in your home

Still the same Docker image:

gladysassistant/gladys:ask-ai-in-scene

But now you can ask:

(The CO2 is at 1200 ppm)

The answer:

You can therefore create conditions based on sensor values, for example « If the brightness is low, then turn on the living room light »

:warning: Be careful though, when a camera image is attached, I find that ChatGPT hallucinates more easily. I tried asking it to describe the camera image + give me a recap of the home’s sensors, and it completely hallucinated, giving me sensor values that I don’t even have!

Tell me how it goes on your side; I’d be curious to see on such a large installation if it doesn’t slow execution down too much.

Next step for me: Multi-action!

2 Likes

Oops !! I think it doesn’t like it ^^

I’ll send you the logs privately !!

Oh dear! Thanks :slight_smile: I’ll look into it next week — I’m returning to France on Monday; I’ll check on Friday!

1 Like

@Terdious I found the cause, it was server-side — the payload was being blocked because it was too large, but the server limit was really very low so I increased it!

Can you retest? :slight_smile:

1 Like

Sniff !!


But no more logs …

Aaaah!! I’m following up. I’ve changed my sentence:

and there are responses:

Are these values correct?