• @mp04610@lemm.ee
      link
      fedilink
      703 months ago

      While that’s the correct thing to do in my opinion, it would be a mistake to assume that Reddit didn’t store your original comments.

      By corrupting their dataset, you may actually be helping them recognize maliciously edited comments.

      • @khannie@lemmy.world
        link
        fedilink
        English
        353 months ago

        it would be a mistake to assume that Reddit didn’t store your original comments.

        They were fairly specific about not doing that (I’d imagine largely because of GDPR).

        I deleted 10 years of “content” before I left and checked their policies. They apparently actually do properly delete from their servers.

          • @joenforcer@midwest.social
            link
            fedilink
            9
            edit-2
            3 months ago

            GDPR is no joke. Storing a handful of comments is not worth the penalty if they get caught.

            Note that I speak from experience as part of a company that needs to comply with the regulations. We do it because the risk of violation is 10000000% not worth it no matter how annoying and arduous it is to comply.

          • @khannie@lemmy.world
            link
            fedilink
            English
            113 months ago

            That’s true but it’s far easier to globally implement rather than trying to segment. Very difficult to accurately prove a user isn’t EU resident across an entire userbase.

            • @ItsAFake@lemmus.org
              link
              fedilink
              English
              53 months ago

              That’s probably why they don’t let you access Reddit with a VPN, so they can have some idea of location.

      • @TropicalDingdong@lemmy.world
        link
        fedilink
        183 months ago

        Yeah, I mean I knew that when I was doing it.

        Sometimes all you can do is make a symbolic gesture that really does nothing, and even if it does nothing, you should still do it.

        Probably leaving and supporting lemmy by paying for some developer fees (i’m on the patreon), posting and commenting, probably 100x more damaging to Reddit.

        • @FeelThePower@lemmy.dbzer0.com
          link
          fedilink
          103 months ago

          FWIW, I requested an old reddit accounts data the other day under CCPA and all the contamination was in there. My guess is their backend updates every so often. i guess i made a good call to edit my comments and leave them there to simmer before i deleted them along with the account. perhaps this is the way?

      • @flambonkscious@sh.itjust.works
        link
        fedilink
        English
        53 months ago

        Mass edits made rapidly are obviously suspect, too… If the same user edits anything more than a dozen comments in, say a minute, you have to ask what’s going on

    • @ElCanutOP
      link
      623 months ago

      Can’t post a genius idea like this one without posting the links of the tools

          • @KnightontheSun@lemmy.world
            link
            fedilink
            83 months ago

            Not necessarily true. I overwrote several thousand comments with a different tool and used three different quotes on greed. I have periodically checked and about two dozen came back. I just manually changed them at that point.

        • @RecallMadness@lemmy.nz
          link
          fedilink
          8
          edit-2
          3 months ago

          This would be better if it fed the parent comment into ChatGPT prefixed with “create a plausible but factually incorrect aggressive response to <comment>”

          Feed the machine to the machine!

      • @Sabin10@lemmy.world
        link
        fedilink
        83 months ago

        A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldn’t be here having this conversation.

        • @TropicalDingdong@lemmy.world
          link
          fedilink
          143 months ago

          A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldn’t be here having this conversation.

          No it didn’t use the API. You had to run it in browser and be logged in to reddit.

        • The tool I used had an extension for Firefox. You then used that Reddit extension so you could get more scrolling on your post history. Then you pressed a button and it would insert gibberish for all comments and posts. Then you’d go next page and do it again.

      • BolexForSoup
        link
        fedilink
        1
        edit-2
        3 months ago

        Ever since they locked down their API pretty much every tool broke. I made sure to run everything a few days before leaving Reddit for good during the API fiasco last summer.

        That doesn’t mean nobody’s made anything that will work now. But yeah, all the old tools are broken.

    • @Ragnarok314159@sopuli.xyz
      link
      fedilink
      213 months ago

      I think Reddit caught on to this. I tried destroying my comment history (~7 years with 600k karma) with a few of the available tool on GitHub.

      Found my account permabanned next time trying to login. People should attempt to eliminate/poison as much as possible, but Reddit has all the comments and modifications in a database somewhere to sell it all to whatever AI is the highest bidder.

      They have to do something to make money after taking away awards. The advertising is absolute shit and not worth the $100 entry fee.

    • VaultBoyNewVegas
      link
      fedilink
      133 months ago

      I edited mine via a tool to say fuck Reddit and Steve Huffman is a greedy pig boy.

      • @PlasmaDistortion@lemm.ee
        link
        fedilink
        English
        243 months ago

        I used a tool that edited my comments to replace it with gibberish. Supposedly Reddit still retains deleted comments but if you edit them, it only keeps the latest version. So by editing it you make the comments worthless.

        • @Octopus1348@lemy.lol
          link
          fedilink
          173 months ago

          I also edited my comments to be basically a Lemmy ad and completely deleted the posts except in a few communities where it could be helpful in the future.

            • citrusface
              link
              fedilink
              English
              3
              edit-2
              3 months ago

              Thank you

              Edit - This worked great thank you. Was able to scrub my Twitter as well.

            • @PerogiBoi@lemmy.ca
              link
              fedilink
              13 months ago

              Just redacted 5 years worth of comments with this. Now to let my account sit for a few months so their backups have only my latest masterpieces. Thanks!!!

      • @TropicalDingdong@lemmy.world
        link
        fedilink
        123 months ago

        I ran a script over all of my comments (through my browser) to edit them into something about how spez had back stabbed the community. I had tens? hundreds of thousands? of comments.

        It took several hours to run, but I did a forward pass (newest to oldest) and a backwards pass (oldest to newest). It bugged out because it had to run so long but I think I got it all.

        I’m not sure this will really do anything because you could pretty easily statistically isolate any one who did what I did, and roll their account history back to a prior state in the training data.

        Regardless, it was the least I could do on the way out the door.

      • Jo Miran
        link
        fedilink
        5
        edit-2
        3 months ago

        It replaces them with gibberish. I did the same for my 12+ years worth.

    • Kbin_space_program
      link
      fedilink
      343 months ago

      Time to make a lot of wandering dwarf bots on reddit to make variations of various game phrases all over, so the LLM based bots just spout Rock And Stone and This is my favourite store on the Citadel?

    • @Ilovethebomb@lemm.ee
      link
      fedilink
      213 months ago

      Thing is, you could use a bot to do nothing but post pop culture references, and it would be indistinguishable from a garden variety Redditor. Reddit is one of the worst places to train an AI.

  • @Adalast@lemmy.world
    link
    fedilink
    623 months ago

    OpenAI team after including the data: why is the model suddenly even more horny, abusive, and discriminatory?

  • alphacyberranger
    link
    fedilink
    English
    503 months ago

    If it takes reddit data to train a model, instead of Artificial Intelligence we will end up with Artificial Idiocy and a horny one that too.

  • Eager Eagle
    link
    fedilink
    English
    253 months ago

    Good move, but anyone using public data already applies a simple spam filter to reject “dumb” data poisoning. Also, hatred and other negative comments as responses will be penalized in a language model training, so an effective data poisoning takes effort. I’ll just throw some ideas here how poisoning could hypothetically have a tangible negative impact in their results.

    The best one can do in terms of data poisoning is make comments that are not easily discernible from usual comments - both for humans and machines - but are either unhelpful or misleading. This is an “in-distribution” data poisoning attack. To be really effective in having any impact whatsoever for training, they need to be mass applied using different user accounts that also upvote each others’ comments in a way that mimics real user interaction: if applied in a simplistic way, a simple graph analysis on these interactions can highlight these fake accounts as a christmas tree.

    • @greenskye@lemm.ee
      link
      fedilink
      English
      22
      edit-2
      3 months ago

      but are either unhelpful or misleading

      Honestly that just sounds like a lot of Reddit users in general

      • Darth_Mew
        link
        fedilink
        73 months ago

        yea we know that’s why he said that because that’s “real” reddit content

    • @Adalast@lemmy.world
      link
      fedilink
      33 months ago

      I was contemplating the merits of botting with the current model with slight vectorization offsets so the data becomes prone to overfitting.

      I would think it would alao work to post using valid, but non-standard syntax so it muddies the n-gram searches.

    • Armok: God of Blood
      link
      fedilink
      223 months ago

      We should have started an all-out attack on Reddit once they started forcing open subs by removing mods. People folded like soggy tortillas.

      • @madcaesar@lemmy.world
        link
        fedilink
        153 months ago

        I just left and came here after 10+ years on reddit. No point wasting time energy trying to take reddit down. They are fucked anyway. Anytime I check back for something occasionally the quality of posts / comments is just pure garbage.

      • @PerogiBoi@lemmy.ca
        link
        fedilink
        23 months ago

        Just like when Netflix and Disney plus and every other streaming service colluded to all raise their prices and remove account sharing.

      • @Daxtron2@startrek.website
        link
        fedilink
        13 months ago

        My account got locked out after I lost all my authenticators with an old phone. Reddit is one of only a few sites that would not let me change it.

  • Set up a bot that just constantly posts blatantly wrong information, like “the earth is flat according to encyclopedia Britannica”, “the sky is green because it’s full or chlorophyll according to the UK foundation of science”

    • @jkrtn@lemmy.ml
      link
      fedilink
      23 months ago

      You won’t poison the data if the bot is on there just doing the same things as the redditors.

    • @Vilian@lemmy.ca
      link
      fedilink
      13 months ago

      we need to make a repository just for that and spam reddit with it, everyone is welcome to contribute, open-source fake news

      • @Bombyk0l@sh.itjust.works
        link
        fedilink
        13 months ago

        That should be super easy. Just make a massive database of random stuff and put them in a sentence structured “XX is YY because ZZ” with no other explanation.

  • Beefalo
    link
    fedilink
    English
    193 months ago

    This announcement is just “oh by the way, the horse is now out of the barn. He left like 10 years ago but this is the announcement.”

    Shout out to whoever dismissed the first AI writings with “It’s like a perfect Redditor. Totally confident and completely full of shit, doesn’t even know that it’s lying.”

    That doesn’t happen by accident. That happens when everyone was already scraping the shit out of the site, at the very least.

  • @boatsnhos931@lemmy.world
    link
    fedilink
    173 months ago

    Dear God, I’ve posted a lot of nonsense and untrue things over the years. You guys want to do a candle light vigil tonight for ai?

  • @Flumpkin@slrpnk.net
    link
    fedilink
    163 months ago

    I’m pissed at reddit but I still hate searching for something and finding a post on reddit discussing it, only to find some of the posts being deleted or overwritten.

  • @EmperorHenry@discuss.tchncs.de
    link
    fedilink
    11
    edit-2
    3 months ago

    after they announced it would’ve been the time to start poisoning the comments. Then it would’ve been completely justified and moral.

    Honestly, keep up the good fight. Start poisoning all open sources being scraped by any type of AI.

    And I use the term “ai” very, very loosely. Because what’s called ai now isn’t real ai. It’s just an automated data collection tool.

    It doesn’t create anything, it plagiarizes real artists.

    • @FIST_FILLET@lemmy.ml
      link
      fedilink
      33 months ago

      exactly, ”ai” right now is just a computer parrot. why settle for blurry generic versions of the art that it is digesting and shitting back out?

      • @mods_are_assholes@lemmy.world
        link
        fedilink
        1
        edit-2
        3 months ago

        Nailed it. The whole essence of AI is that it can make images with a variety of colors and styles, but it’s not creative or artistic by definition. At the end of the day, it’s just a bunch of numbers and equations being translated into pixels on a screen.

        (This comment pasted from NovelAI with this prompt:

        Please write a reply to this interrnet comment: exactly, ”ai” right now is just a computer parrot. why settle for blurry generic versions of the art that it is digesting and shitting back out?)

        • @Fades@lemmy.world
          link
          fedilink
          2
          edit-2
          3 months ago

          That is not the “whole essence” of it all… You are summarizing he whole piece of tech off a single use-case (image generation).

          AI is MUCH more than just a picture or generator. As a software engineer I use AI for things like debugging or quickly automating some tasks

  • @byroon@lemmy.world
    link
    fedilink
    93 months ago

    So you’ve contaminated the training data for an LLM by spamming a public forum? Seems like everyone loses

      • @TwanHE@lemmy.world
        link
        fedilink
        53 months ago

        There were some scripts for it. But i can still find my comments and posts trough Google after deleting them.

        Dont think reddit will let you take “their” (your) content away.

        • @apemint@lemmy.world
          link
          fedilink
          English
          113 months ago

          Oh, shit you’re right.

          I wiped my whole profile years ago (with a script that overwrites your comments before deleting) but they’re still visible everywhere except in my profile.

          Isn’t this bullshit illegal?

          • Beefalo
            link
            fedilink
            English
            23 months ago

            Try out Shreddit, it’s a web app for exactly this. It even lets you filter by post karma so you can keep your hits. I’ve never used it but that’s the name that came up over on Reddit from everyone talking about the announcement.

            • @apemint@lemmy.world
              link
              fedilink
              English
              13 months ago

              Well, here’s the problem; my profile is completely empty. There’s nothing left to delete.
              There are no comments, submissions or anything there, but when I go to a thread via google (or a link), all my comments are still present.

        • @MBM@lemmings.world
          link
          fedilink
          03 months ago

          Dont think reddit will let you take “their” (your) content away.

          There should be a way to do it under GDPR

          • @TwanHE@lemmy.world
            link
            fedilink
            23 months ago

            I filed a right to erasure request as well. Never got a response/nothing happened, but currently not in the position to take them to court over it.