Let's talk about Gladys V4

I must say I’m surprised by this kind of reaction!

It’s extremely rare to talk publicly about this (I think software avoids talking about it to not scare users, especially in open-source), but I don’t know a single open-source software that doesn’t know its version fragmentation statistics, usage by country, and number of installations. Sorry for throwing a stone in the pond :smiley:

Have you ever developed with VSCode? Used Firefox (or another)? Or even written messages on a Discourse forum like this one? :smiley: All these software collect statistics.

I’m not saying « because they do it, we should do it ». I’m not even talking about collecting in-depth statistics like these software do, I’m not even talking about keeping IPs, I’m just talking about keeping, « server-side », a log of update calls just to know the installation park.

Are you shocked that there are analytics on the Gladys site for example? :slight_smile:

Exactly! It’s very important to know which versions are running in the wild, just for security.

We think of France as a small country, but for example in the US, just knowing « United States » gives little information, right? Having a truncated latitude/longitude to know if it’s more west coast/east coast seems good to know where the community is.

I may have used a word a bit too strong when talking about telemetry… :smiley:

I’m just proposing to count the API update calls on the server side and make statistics on the user-agent of the program that calls.

In short, with each update call, if the program that calls has as user-agent « gladys v4.0.1 », in short, I count save +1 for Gladys v4.0.1, and I do a GeoIP to get the country + the region + a very imprecise latitude/longitude.

I’m thinking of this kind of data model:

CREATE TABLE t_gladys_usage (
    id uuid DEFAULT uuid_generate_v4() NOT NULL,
    version character varying(255), -- Ex: v4.0.0
    country character varying(255), -- Ex: US
    region character varying(255), -- Ex: CA
    timezone character varying(255), -- America/San-Francisco
    region_latitude character varying(255), -- ex: 44.2
    region_longitude character varying(255), -- ex: 2.3
    created_at timestamptz NOT NULL default now()
);

I don’t see the point except for the timezone.

As long as we have the choice, it’s not a problem @pierre-gilles

Hello,
I am part of those who downloaded and then installed to see.
I do not use Gladys on a daily basis due to not having enough devices.
I would like to install voice recognition but I do not want to give my bank details and stats to Google.
It seems normal to me that Gladys collects a minimum of information to evolve the versions.
But it must remain broad, or otherwise leave the choice of what we want to send.
The version, the country, the region seem acceptable.

@pierre-gilles arff maybe I should have added a little smiley to help you understand my cheap humor… It was a joke, though maybe it wasn’t appropriate here, sorry!

As for the multiple trackers on the Web, I think I’m pretty well protected, ublock, eset smart security with its firewall in strict and interactive mode, then I choose whether to block or not… I’m not a developer, I’m a bricklayer.
Again, sorry you were offended, that wasn’t the point of my message.

Ps: I’m encountering some nginx 502 errors when sending messages.

How does knowing this help? How does it improve security?

OK, you know, for example, that a certain version of Gladys contains a vulnerability. Great, but if your data is anonymous, it doesn’t help you more to have information. It remains statistics.

Same here, it’s having trouble this morning :grin:

I’m not talking about telemetry sent by the client or where you can « opt-out », I’m just talking about counting server calls. There won’t even be an opt-in, because Gladys won’t do anything special! We don’t even collect personal information, everything is done server-side anonymously.

I’ll give a few comparison examples, because I think the reactions are disproportionate compared to the issue ^^ I must have used words that were too strong.

When you do:

docker pull gladysassistant/gladys
npm install

We agree that docker-cli and npm-cli make calls to Docker and NPM servers?

Docker keeps track of these calls, already because it uses their infrastructure (they need to know the volume of traffic on their own servers, it’s the minimum if they want to size their API), and also because they provide download statistics on the different repositories.

This allows them to calculate this:

Docker also probably wants to know which version of the « docker-cli » is running on clients, to know if they can deprecate API routes from old versions, or if these versions are still too widely used.

I just want to have these same stats « server-side ».

It’s important to know the fragmentation of the fleet because you can make decisions based on these stats.

Example: The Android application, it is based on the Gladys API. If we add a new route in version 4.2.1 of Gladys, we want to know if we can use this route in the Android application.

If we see that 99% of the fleet is running Gladys 4.3.0 which is after 4.2.1, then we can say « ok we can implement the feature ».

It’s just an example but it’s not the only one. It’s also a good way to see if the update system is working correctly and that the fleet is migrating regularly.

What a weird one! I’ll check if the server is okay.

Edit: Everything seems to be fine, even if in terms of RAM, I have the impression that we’re just… 200 MB available out of 2 GB of RAM. I’ll keep an eye on it.

We can imagine the following scenario:

Gladys V4.1.2 is extremely deployed and used. But it is affected by a vulnerability that allows an attacker to take remote control (therefore data theft, espionage, infections, zombie PC, ransomware and more). A version v4.1.3 is released to quickly patch all this, but after a few weeks thanks to the statistics we see that barely 20% of users have applied the patch. In this case, we could make a second communication or think of new actions to protect everyone.

This is also why many solutions require an email address before downloading the software. In case of a problem, you can quickly contact everyone.

Hello,
I don’t mind at all if you store this data (even the city, as long as it’s not the exact address, but that’s not possible).

As soon as a user browses the web, they leave traces on the servers they visit.

It’s not the open-source client-side software that sends this data, but the server that uses its logs, so for me, no problem.

Hey, I found the Gatsby (open-source static site project) documentation on the anonymous stats they collect, it’s interesting:

Ok ok let’s go :slight_smile:

@pierre-gilles and the others hello :slight_smile:

Question about Gladys 4. We talk about security etc … Do you think it would be interesting to install on Gladys a tool like for example Pi-hole which would allow managing the requests that leave our network?

We have many connected objects, and in fact we don’t really know what they do. The idea is to prevent them from going out to the outside but only to interact with our gladys. I am thinking in particular of the Xiaomi range which is great but it remains Chinese … And we don’t know what they do with it.

What do you think? Maybe you have other solutions?

I want to create a dedicated home automation network separate from the « public » network used by phones and computers. In this case, blocking and monitoring requests initiated to the outside from a connected object is interesting.
But would Pi-hole be a completely external application to Gladys that each user can choose to install (even if installed on the same Raspberry Pi), or an activable plugin in Gladys, that someone will develop?

In the end, it’s the word « telemetry » that is scary, even though what you want to do is not at all of the same kind: when I hear (or read :slight_smile:) this word, I immediately think of Windows (among others) and the enormous amount of data that the OS sends on its own!
Meanwhile, Gladys will not send anything, it will only ask if an update exists, specifying its current version. It is on the server side that you can make statistics on the versions used and the approximate locations.

If this can help evolve Gladys and the user community, I am in favor of collecting statistics.

  • Gladys version used
  • Country

On my side, I will have 2 Gladys (home and work), so one download for 2 instances, and your statistics are wrong.

Well yeah @pierre-gilles lit the match this morning :collision: :grin:

So you have several ways to think about the problem. You can obviously say that the application is to be installed by the people. The second idea could be to integrate it into the Gladys Rasbian image and thus set it by default. Then we would have Gladys communicating with the Pi-hole API to change the rules or just activate/deactivate the Pi-hole.

This is an idea that might not be bad. In addition, we can even make a docker right next to it.

Great! Glad you understand the need :slight_smile: Nothing malicious on my part, I just want to better understand usage and have the tools to see the evolution of the fleet, but without compromising user privacy.

The current implementation I have done, full server-side as planned, I store in a table:

CREATE TABLE t_gladys_usage (
    id uuid DEFAULT uuid_generate_v4() NOT NULL,
    client_id uuid NULL,
    event_type character varying(255),
    user_agent character varying(255),
    system character varying(255),
    node_version character varying(255),
    is_docker boolean,
    country character varying(255),
    region character varying(255),
    timezone character varying(255),
    region_latitude double precision,
    region_longitude double precision,
    created_at timestamptz NOT NULL default now()
);

A few additions:

  • system: (linux, macos, windows). This will be useful to know the share of each OS.
  • node_version: (v10.0.0) the Node.js version used
  • is_docker (true/false): is Gladys running in Docker?

All anonymous information, but very useful for the project.

Why not a service that will launch the pi-hole container + the configuration :slight_smile:

For me it’s great to explain how to do it and to set everything up to make it easy to configure alongside Gladys, but it should not be installed by default.

The image must be minimal.

Okay for some stats but not at the city level.

That’s why I didn’t sign up for the forum map: imagine a flaw and I haven’t updated…

Even approximately, it’s not complicated to find me by cross-referencing data on the web (even now I wonder if it’s not already the case…) and come and play with the flaw.

It’s a bit paranoid, not pushed either because I don’t filter everything that leaves my home (Xiaomi) but I like to hope to have control over this type of data.

However, indeed: Country, OS, Version, Hardware?, and other joys, if it helps Gladys… :wink: