We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

  • Jilanico@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    8 months ago

    Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

    Is it tho? Honest question.

    • ryannathans@aussie.zone
      cake
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      8 months ago

      Sure, but so is your memory, you could study the originals and re-draw them a similar way.

      • Jilanico@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        8 months ago

        I agree, but I don’t think these generative AIs actually store image files off the Internet in a massive database. I could be wrong.

        • ryannathans@aussie.zone
          cake
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          8 months ago

          That’s correct. The structure of information isn’t anywhere remotely similar to a file or database. Information pixel by pixel isn’t stored, it more loosely remembers correlations and similarities and facts about the content as opposed to storing and copying it

      • Jilanico@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        8 months ago

        It’s too hard to type up how generative AIs work, but look up a video on “how stable diffusion works” or something like that. I seriously doubt they have a massive database with every image from the Internet inside it, with the AI just spitting those pics out, but I’m no expert.

      • Jilanico@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        8 months ago

        So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I’m skeptical, but I’m no expert.

        • QubaXR@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          5
          ·
          8 months ago

          These models were trained on datasets that, without compensating the authors, used their work as training material. It’s not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.

          A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.

          • Jilanico@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            8 months ago

            Most of the stuff that inspires me probably wasn’t paid for. I just randomly saw it online or on the street, much like an AI.

            AI using straight up pirated content does give me pause tho.

            • QubaXR@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              3
              ·
              edit-2
              8 months ago

              I was on the same page as you for the longest time. I cringed at the whole “No AI” movement and artists’ protest. I used the very same idea: Generations of artists honed their skills by observing the masters, copying their techniques and only then developing their own unique style. Why should AI be any different? Surely AI will not just copy works wholesale and instead learn color, composition, texture and other aspects of various works to find it’s own identity.

              It was only when my very own prompts started producing results I started recognizing as “homages” at best and “rip-offs” at worst that gave me a stop.

              I suspect that earlier generations of text to image models had better moderation of training data. As the arms race heated up and pace of development picked up, companies running these services started rapidly incorporating whatever training data they could get their hands on, ethics, copyright or artists’ rights be damned.

              I remember when MidJourney introduced Niji (their anime model) and I could often identify the mangas and characters used to train it. The imagery Niji produced kept certain distinct and unique elements of character designs from that training data - as a result a lot of characters exhibited “Chainsaw Man” pointy teeth and sticking out tongue - without as much as a mention of the source material or even the themes.

          • archomrade [he/him]@midwest.social
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            8 months ago

            These models were trained on datasets that, without compensating the authors, used their work as training material.

            Couple things:

            • this doesn’t explain ops question about how the information is stored. On fact op is right, that the images and source material is NOT stored in a database within the model, it basically just stores metadata about the source material as a whole in order to construct new material from text descriptions

            • the use of copyrighted works in the training isn’t necessarily infringing if the model is found to be a fair use, and there is a very strong fair use argument here.