• Stovetop@lemmy.world
    link
    fedilink
    English
    arrow-up
    103
    ·
    9 months ago

    This is kind of a dumb argument, isn’t it?

    I have to imagine someone centuries ago probably complained about inventors wasting their time on some dumb printing presses so smart people could write books and newspapers better when they could have been building better farm tools. But could we have developed the tractor when we did if we were still handwriting everything?

    Progress supports progress. Teaching computers to recognize and reproduce pictures might seem like a waste to some people, but how do you suppose a computer will someday disassemble a ship if it is not capable of recognizing what the ship is and what holds it together? Modern AI is primitive, but it will eventually lead to autonomous machines that can actually do that work intelligently without blindly following an instruction set, oblivious to whatever might be actually happening around it.

    • Zorque@kbin.social
      link
      fedilink
      arrow-up
      36
      ·
      9 months ago

      The argument isn’t against the technology, it’s against the application of that technology.

      • hansl@lemmy.world
        link
        fedilink
        English
        arrow-up
        31
        ·
        9 months ago

        Path of least resistance. It is harder to build a robot who can disassemble ships with its hands than it is to pattern match together pictures.

        This XKCD comes to mind: https://xkcd.com/1425/

    • EldritchFeminity@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      23
      ·
      9 months ago

      This isn’t even close to what they’re saying. It’s closer to complaining about how the Yankees replaced their star pitcher with a modified howitzer.

      It’s not about people “wasting their time on some dumb invention,” it’s about how that useful invention is being used to replace jobs that people actually like doing because it’ll save their bosses money. It’s not even like when photography was invented or Photoshop came out and people freaked out about artists being put out of work, because those require different skill sets and opened up entirely new fields of art while also helping optimize other fields. This stuff could improve the fields that they’re created for by helping people optimize their workflow to make the act of creating things easier. But that’s not what they’re doing. It’s being used to mimic the skills of the people who enjoy doing these things so that they don’t have to pay people to do it.

      Even ignoring the ethical/moral aspect of this stuff being trained without permission on the work of the people it’s designed to replace, the end goal isn’t to increase the quality of life of people, allowing us more time to do the things we love - things like, you know, art and writing - it’s to make the rich even richer and push people out of well-paying jobs.

      The closest example I can think of is when Disney fired all their 2d animators and switched to 3d. They didn’t do it because 3d was better. In many ways, the quality was much worse at the time. But 2d animators are unionized and 3d animators aren’t, so they could get away with paying them much less. The same exact thing happened with the practical effects vs. digital effects guys in Hollywood right around the same time.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        ·
        edit-2
        9 months ago

        Society has always been losing jobs, the population just pivots to other specialisations. The only reason we fear it is because of our economic system that preys on it and turns it into profit, but that’s an other conversation entirely.

        On the subject of losing creative venues, both your examples(photography and Photoshop) show how technology didn’t detract from the arts but add to it, letting the average person do much more. The same will be true for AI, I can see an inevitable boom happening in the filmmaking and animation industry, not to mention comic books and most of all indie gaming. It’s in the long run empowering for the individual imo.

        • EldritchFeminity@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          12
          ·
          9 months ago

          The economic system is what he’s talking about here. That was my point. The entire conversation from the side against this stuff has always been about the economic situation of it. Without that factor, I think the only thing people would care about is whether or not their work is being used without their permission/maliciously.

          As for Photoshop and photography, that’s actually why I brought those up specifically. Because they were feared as things that would destroy artists’ jobs and actually brought about entirely new fields of art - and also because they’re the two people bring up when people argue against LLM replacing people’s jobs, acting like they’re just some Luddites afraid of science.

          Right now, the way I see it with AI is that there are 2 distinct groups benefiting from it: those whose workflow has been improved from the use of AI, and those who think AI can get them the result of work without having to either do the work themselves or pay somebody else to do it. And thanks to the economic issues that are at the heart of this whole thing, that second group is set to harm the number of people who can spend time creating things simply because they now have to work a job that isn’t creating things and no longer have the time to put towards that. So I can see AI creating a whole new art boom or a bust in equal measure. That second group is of concern to the art communities as well because they only see the destination and don’t see that the journey is just as important to the act of creation, and that is already causing schisms between artists and “prompters” who think that they’re just as skilled because they used a generator to make some cool stuff. People are already submitting unedited, prompted work to art and writing competitions.

    • Barbarian@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      9 months ago

      I get the sentiment, but it’s a bad example. Transformer models don’t recognize images in any useful way that could be fed to other systems. They also don’t have any capability of actual understanding or context. Heavily simplifying here, tokenisation of inputs allows them to group clusters of letters together into tokens, so when it receives tokens it can spit out whatever the training data says it should.

      The only actual things that are improving greatly here which could be used in different systems are natural language processing, natural language output and visual output.

      EDIT: Crossed out stuff that is wrong.

      • MrConfusion@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        ·
        9 months ago

        Well, this is simply incorrect. And confidently incorrect at that.

        Vision transformers (ViT) is an important branch of computer vision models that apply transformers to image analysis and detection tasks. They perform very well. The main idea is the same, by tokenizing the input image into smaller chunks you can apply the same attention mechanism as in NLP transformer models.

        ViT models were introduced in 2020 by Dosovitsky et. al, in the hallmark paper “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (https://arxiv.org/abs/2010.11929). A work that has received almost 30000 academic citations since its publication.

        So claiming transformers only improve natural language and vision output is straight up wrong. It is also widely used in visual analysis including classification and detection.

        • Barbarian@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          Thank you for the correction. So hypothetically, with millions of hours of GoPro footage from the scuttle crew, and if we had some futuristic supercomputer that could crunch live data from a standard definition camera and output decisions, we could hook that up to a Boston dynamics style robot and run one replaced member of the crew?

      • GBU_28@lemm.ee
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        9 months ago

        Huh? Image ai to semantic formating, then consumption is trivial now

        • Barbarian@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          9 months ago

          Could you give me an example that uses live feeds of video data, or feeds the output to another system? As far as I’m aware (I could be very wrong! Not an expert), the only things that come close to that are things like OCR systems and character recognition. Describing in machine-readable actionable terms what’s happening in an image isn’t a thing, as far as I know.

          • GBU_28@lemm.ee
            link
            fedilink
            English
            arrow-up
            8
            ·
            edit-2
            9 months ago

            No live video no, that didn’t seem the topic

            But if you had the horsepower, I don’t think it’s impossible based on what I’ve worked with. It’s just about snipping and distributing the images, from a bottleneck standpoint

            • Barbarian@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              9 months ago

              No live videos

              Well, that’d be a prerequisite to a transformer model making decisions for a ship scuttling robot, hence why I brought it up.

          • FooBarrington@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            9 months ago

            Describing in machine-readable actionable terms what’s happening in an image isn’t a thing, as far as I know.

            It is. That’s actually the basis of multimodal transformers - they have a shared embedding space for multiple modes of data (e.g. text and images). If you encode data and take those embeddings, you suddenly have a vector describing the contents of your input.