YSK: Your Lemmy activities (e.g. downvotes) are far from private

Muddybulldog@mylemmy.win · edit-2 3 years ago

YSK: Your Lemmy activities (e.g. downvotes) are far from private

booty_flexx@lemmy.world · 3 years ago

To illustrate op’s point I’m going to spin up an instance, federate with everyone, and not tell anyone what that instance is.

Then I’m going to feed all that data into my new website, called Open Lemmy Stats, where anyone can query the user data ive accumulated. The homepage will be ripe with insights, leaderboards and all kinds of data on prolific users.

Additionally, I’ll display a snapshot/profile of a random user by feeding that users data to GPT4 to make inferences about the user’s political affiliations and display the results.

Worst of all, I’m not going to out my instance for everyone to know it as the one to defederate. In fact I’m spinning up a few instances that will host innocuous communities that I plan to mod and support to give my instances cover for their true purpose: redundant fediverse datastreams for my site, Open Lemmy Stats.

I’ll also have a store where anyone can buy my collected fediverse data for a handsome sum.

Just kidding I’m not doing any of this. But someone absolutely will or already is.

agoramachina@lemmy.world · 3 years ago

You know, I came in here with the mindset that the topic of discussion here isn’t a bad thing; I’m largely pro information-should-be-open-and-available. But you’ve argued a very solid point, and I’ve changed my mind on the issue. I appreciate you sharing this perspective!

stevedidWHAT@lemmy.world · 3 years ago

With all due respect, figuring out who you are based off what you say in a public setting is already what people do irl

Reliant1087@lemmy.world · 3 years ago

I think your comment clearly illustrates what might go wrong with it. If they need this data for sorting or something else absolutely, then I would be happy if they just hashed the usernames/instances or used some other form of UID.

stevedidWHAT@lemmy.world · 3 years ago

Lmao the internet finally realizing what companies and the govt have been doing for decades on the internet

pfr@lemmy.sdf.org · edit-2 3 years ago

I’m almost willing to bet that big tech companies are already doing this. They got the motive and the means. No doubt Meta or Google have dedicated some of their servers to mining our Lemmy data in this way.

Zackyist@sopuli.xyz · 3 years ago

With only around 100k users and most people using anonymous usernames that cannot be connected to their identity it would hardly be worth the effort, time or money.

Quinnel@lemmy.world · 3 years ago

You’re looking at this from the wrong point of view. The fediverse is not just lemmy: Threads, Tumblr, even BlueSky (albeit with their own protocol, but anyone could just modify their fediverse enabled app to convert their data to be applicable to BlueSky’s protocol) are quickly setting the stage for a new norm. The more websites integrate the fediverse into their stack, the more data outside the immediate sphere of influence of these major corporations can be harvested. To what ends they’ll use it, I don’t know – but I don’t trust them with it.

Smk@lemmy.ca · 3 years ago

They will know the user but not the person in real life. Even if you know that my user is more conservative on some points or more liberal on others, how can you use that for nefarious action ? Unless you know where I live and who I am, the data is useless.

People need to be aware that sharing your personal information on the internet is never a good idea.

GenderNeutralBro@lemmy.sdf.org · 3 years ago

It’s very difficult to both A) have meaningful conversations in a public space, and B) conceal your identity from a dedicated adversary. Once a person has a long post history, it’s likely that an observer could narrow down their identity to a very small group, if not a single person. Every post you make reveals something.

Even if you don’t ever explicitly state it, your age range and gender can likely be guessed with high probability by your writing style and/or little tidbits of info you leak without thinking about it. Same for political leanings. You might casually mention the brand of car you drive, or your favorite foods, or just reference something you experienced as a child that is not universal. All of these things leak information, and while each one seems insignificant, in aggregate they can tell a detailed story. Just knowing that you’re a Canadian who speaks both French and English eliminates about 99.8% of the world’s population as possibilities.

Back on Reddit I used to create fresh accounts all the time, but then I’d go and join the same subs, post with the same writing style, and generally express the same worldview. If anybody cared, had a good grasp of statistics, bothered to collect the data, and put in a stupid amount of time to it, they could likely match all of my accounts together. I was never too worried about this because…well I just didn’t care. But I did have a cyberstalker at one point and it made me think.

I wouldn’t be shocked if someone could match me to one or more of my Reddit accounts just from this one comment, tbh. I’m leaking information here like a sieve! Not many people have the skills to do that, and the few who do are unlikely to give a rat’s ass about me. HOWEVER, as AI becomes more advanced, anyone with computer literacy will be able to do analysis in minutes that might currently take an expert days or weeks.

Smk@lemmy.ca · 3 years ago

I get what you’re saying. I’m not sure if it’s something that is fixable giving that we participate in a public forum. Maybe the federation isn’t a great idea after all, or maybe we overthink it. I don’t know.

Cuz :twit:@twit.social · 3 years ago

@booty_flexx @muddybulldog do we ever see these fediverse products employing a plugin system where such a bot could be added easily by instances that wanted to?

kolorafa@lemmy.world · 3 years ago

Red*it can do that too (if not doing it already) but they also have your personal details linked especially when paying for premium :)

deweydecibel@lemmy.world · edit-2 3 years ago

And just think how much data you can gather by sending out puppet accounts on various instances, accounts that will serve only to publicly state an opinion, such as “I support this candidate”, so the data on the people who upvote it can be harvested and categorized more easily. There is so much data harvesting potential here with a little imagination, and with a little more, a lot of ways to use that data to influence the way average users engage with the fediverse.

That site would also be a great advertisement for Lemmy. Come here to our decentralized platform, where you can vote…but you better not, lest you end up on the site. What social network wouldn’t grow when users are peer pressured into not using one of it’s basic underlying mechanics that makes the whole thing work?

HamSwagwich@showeq.com · 3 years ago

Lemmy is not a decentralized platform. It’s a federated one. Lemmy is very much centralized.

We need a decentralized system. Lemmy isn’t it.

SendMePhotos@lemmy.world · 3 years ago

That was pretty interesting. I want to see graphs.

deegeese@sopuli.xyz · 3 years ago

Can your instance secretly run a fork that doesn’t respect deletes?

irelephant [he/him]🍭@lemm.ee · 1 year ago

Yes.

2 years ago

Gpt has to small of a context window to get someone’s entire post history in. U have to embed everythibg they have said then u can make queries against their knowledge base or grouping user content embeddings and comparing to known data points. Not that i have the compute to do this at any kind of scale.

EurekaStockade@lemmy.world · 3 years ago

Honestly, why not? The data is already being recorded. At least this way it’s public and the rest of us get to interact with it. It might even scare a few people into paying attention to the information that they disclose about themselves and increase their digital hygiene.

okamiueru@lemmy.world · 3 years ago

If I’m reading it correctly, and please help me out if not: recorded data by the nature of being stored somewhere, should be made public?

That doesn’t make all that much sense. Data retention and access levels should always be tied to a use case that require it. And, there is no “if anything is stored, it should all be public”

EurekaStockade@lemmy.world · 3 years ago

recorded data by the nature of being stored somewhere, should be made public?

The difference is that this data can already be surfaced by anyone, all they need to do is spin up a federated instance, so someone could do all the stuff outlined in the parent comment, but keep the results for themselves, or monetise it, build advertising profiles, doxx people, etc.

The data already exists, and it can already be extracted and made public (or used privately). I’m not saying throw open every database to the world, I am saying the world can already access this database, so pretending that it’s not available doesn’t stop bad actors from using it. Might as well make a public tool (that actually sounds kinda cool?) and bring awareness to it.

okamiueru@lemmy.world · 3 years ago

Ah, gotcha. I don’t think anyone was saying that the solution was to try to make the problem less visible.