fiat_lux 🆕 🏠

Relocated from: @fiat_lux@lemmy.world ⛓️‍💥(04-2026)

  • 1 Post
  • 16 Comments
Joined 4 天前
cake
Cake day: 2026年4月24日

help-circle


  • My suspicion is they just pick up data that a real person is considering an attempt, and then allow the least risky ones to get closest to success. Their base will cast whoever tries anything as a leftist regardless of the reality, or conveniently forget they’re right-wing, but it’s not really about making left-wing people look violent. It’s about dominating the media airtime and controlling people’s attention. It’s the same tactic Trump successfully uses on social media or on TV - throw a bunch of shit out there and let the media pick at it while doing the actually heinous shit.

    There’s just no other reason that it makes sense for this event to have no security, 2 months after someone with a shotgun and gas can went into mar-a-lago.


  • When I was about 12, I got into a discussion about the environment with another kid at school. She told me that it didn’t matter if we ruined the environment of the countries we all live in now, because we could all just move to the Arctic or Antarctica.

    I was so surprised by the absurdity of that statement that it stuck with me vividly. To her credit, some years later she asked if I remembered her saying that and then admitted that it was a dumb thing to say. I occasionally remember this as an amusing childhood experience.

    Besides the credit part, I remembered it again today for a different reason, this time in a conversation about model collapse.

    [Model collapse is] a solved problem. We can see that it’s solved by the fact that AI models continue to get better, despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

    AI models are never going to get worse than they are now because if they did get worse we’d just throw them out and go back to the earlier ones that worked better, perhaps re-training with the same data but better training techniques or model architectures.

    This is my fault for letting myself get into a discussion about model collapse on the fediverse.

    I’m not sure why model collapse isn’t a big topic anymore, but maybe that’s just because the environmental catastrophes are a more pressing concern. To be clear, I’m not concerned about the models themselves, just our increasing inability to verify the authenticity or accuracy of any information we encounter, including search engines just not turning up any useful results.

    On a slightly different topic, if anyone has suggestions for how a person could acquire money to live, which can’t involve physical labor, is probably remote-only, and possibly allows part-time flexibility, while unable to move from an expensive location for at least the next couple of years: I’m open to ideas. Because scamming people on Polymarket with a hairdryer sounded far more appealing than it ought.


  • We can see that it’s solved by the fact that AI models continue to get better despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

    Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn’t, even the premise that models are improving to a significant degree is up for debate.

    MMLU pro benchmark over time line graph showing plateauing values Massive Multitask Language Understanding (MMLU) benchmark vs time 07-2023 to 01-2026

    A lot of people really want to believe that AI is going to just “go away” somehow, and this notion of model collapse is a convenient way to support that belief

    Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.

    You can tell it’s not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.

    Some choice excerpts from the abstracts, for those who don’t want to click the links:

    Our results show that even the smallest fraction of synthetic data (e.g., as little as 1% of the total training dataset) can still lead to model collapse

    …we establish … that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions … are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set.


  • It can’t only be from data from previous generations, even if the initial demonstration used that, because that would mean a single piece of human-generated text is sufficient to avoid collapse.

    The loss of data from generation to generation is one way model collapse can occur, but it’s only one way. The actual issues that cause collapse are replication of errors and increasing data homogeneity. In a world where an unknown quantity of new data is AI generated, it is not possible to ensure only a certain quantity is used as future training data.

    Additionally, as new human generated content is based on the information provided by AI, even if not used intentionally in the construction of the text itself, the error replication and data diversity issues cross over from being only an AI-generated content problem to an all content problem. You can see examples of this happening now in the media where a journalist relies on AI output to fact check, and then the article with the error gets republished by other media outlets.

    Real AI training methods may stave off some model collapse, if we ignore existing issues around the cultural homogeneity of training data from across all time periods, or assume the models are sufficiently weighted to mitigate those issues, but it’s by no means settled that collapse is a non-problem.

    You’ve mentioned using data mixing to prevent collapse, but some of the research suggests that even iterative mixing isn’t sufficient dependent on the quantities of real vs synthetic data. Strong Model Collapse (2024), Dohmatob, Feng, Subramonian, Kempe goes into that, and since then there’s been When Models Don’t Collapse: On the Consistency of Iterative MLE (2025) Barzilai, Shamir which presents one theoretical case where collapse won’t occur provided some assumptions hold, but the math is beyond me. They also note multiple situations where near-instant collapse can occur.

    How much data poisoning might affect any of that is not at all clear, it would need to be in sufficient quantity for whatever model to have an effect, but it certainly wouldn’t help. The recent Bixonimania scandal suggests it’s feasible.



  • “model collapse” was demonstrated by repeatedly training generation after generation of models on the output of previous generations

    the best models these days are trained largely on synthetic data - data that’s been pre-processed by other AIs to turn it into stuff that makes for better training material

    You can prevent model collapse simply by enriching the training data with good data - stuff that is already archived, that can’t be “contaminated."

    This feels like an odd juxtaposition.

    If model collapse can be avoided by enriching with uncontaminated data, and model collapse comes from using training data generated by previous generations, doesn’t that imply that:

    1. Either the best models are headed towards model collapse, or,
    2. Models can’t be updated because modern data isn’t usable?




  • Panel 3 and 4 aren’t quite right.

    The guy in Panel 3 didn’t just remix it, he cherry picked the parts that would be most likely to rank for either a short or long tail keyword strategy depending on the size and business of his client or employer.

    And that guy doesn’t have his paper taken away in Panel 4. He’s feeding as many papers as he can to the AI which are tailored for “Answer Engine Optimization” or “Generative Engine Optimization” (they haven’t settled on a catchy name yet for what is largely the same thing, even if some claim they’re different).

    The techniques have changed slightly but SEO has been a filthy game for much longer than AI. Google made sure of that with their auction house, “featured snippet” sections and backlink authority ranking systems.



  • I laughed when the song kicked in after the intro. Your description of the whole thing was completely accurate.

    That drummer was definitely way too good, and that’s probably why he featured more heavily in the clip than I think I’ve ever seen a drummer feature. I hope he’s doing something fun now.


  • There are two surprising aspects of this to me. Firstly that the employees feel confident enough to express concern about Palantir’s actions in official channels. I would have thought that the nature of their work was obvious enough that this would be a cultural taboo and therefore self-censored. I guess some of them have limits to suspending disbelief for what they had likely internally framed as “work for the benefit of national security” or “job pays too well to care”.

    The second part is that not all of this official channel discussion was immediately wiped by Palantir, but perhaps they also relied on the premise of self-censorship in preventing these conversations at scale.

    Either way, I’m somewhat relieved there’s someone at Palantir worried about this at all. The more of them who are worried by this, the more leaks we’ll see.