• themusicman@lemmy.world
    link
    fedilink
    English
    arrow-up
    119
    ·
    9 months ago

    Is it possible the first response is simply due to the date being after the AI’s training data cutoff?

    • Linkerbaan@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      66
      arrow-down
      1
      ·
      9 months ago

      It seems like Gemini has the ability to do web searches, compile information from it and then produce a result.

      “Nakba 2.0” is a relatively new term as well, which it was able to answer. Likely because google didn’t include it in their censored terms.

      • GenEcon@lemm.ee
        link
        fedilink
        English
        arrow-up
        20
        ·
        9 months ago

        I just double checked, because I couldn’t believe this, but you are right. If you ask about estimates of the Sudanese war (starting in 2023) it reports estimates between 5.000–15.000.

        Its seems like Gemini is highly politically biased.

        • Linkerbaan@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          9 months ago

          Another fun fact: according to NYT America claims that Ukrainian KIA are 70.000 not 30.000

          U.S. officials said Ukraine had suffered close to 70,000 killed and 100,000 to 120,000 wounded.

    • casual_turtle_stew_enjoyer@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      7
      ·
      9 months ago

      This is not the direct result of a knowledge cutoff date, but could be the result of mis-prompting or fine-tuning to enforce cut off dates to discourage hallucinations about future events.

      But, Gemini/Bard has access to a massive index built from Google’s web crawling-- if it shows up in a Google search, Gemini/Bard can see it. So unless the model weights do not contain any features that correlate Gaza to being a geographic location, there should be no technical reason that it is unable to retrieve this information.

      My speculation is that Google has set up “misinformation guardrails” that instruct the model not to present retrieved information that is deemed “dubious”-- it may decide for instance that information from an AP article are more reputable than sparse, potentially conflicting references to numbers given by the Gaza Health Ministry, since it is ran by the Palestinian Authority. I haven’t read too far into Gemini’s docs to know what all Google said they’ve done for misinformation guardrailing, but I expect they don’t tell us much besides that they obviously see a need to do it since misinformation is a thing, LLMs are gullible and prone to hallucinations and their model has access to literally all the information, disinformation, and misinformation on the surface web and then some.

      TL;DR someone on the Ethics team is being lazy as usual and taking the simplest route to misinformation guardrailing because “move fast”. This guardrailing is necessary, but fucks up quite easily (ex. the accidentally racist image generator incident)