If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech

db0@lemmy.dbzer0.com · 2 years ago

If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech

queermunist she/her@lemmy.ml · 2 years ago

The main argument behind the article is that only Big AI can afford to pay for licensed materials to train their AI, so this will hurt the small developers.

Does small/indie AI even exist? I was under the impression that it requires massive amounts of processing power to even run these AI.

keepthepace@slrpnk.net · 2 years ago

I was under the impression that it requires massive amounts of processing power to even run these AI.

It was true a year ago but things change quickly. People run LLMs on raspberry pi nowadays.

Training requires more GPU power but this too is becoming more and more accessible. Fine tuning is easy, training a base model still a bit beefy, but small teams manage to do it. Mistral, the french company that trained the latest good 7B model from scratch just has a few dozen people.

The main argument behind the article is that only Big AI can afford to pay for licensed materials to train their AI, so this will hurt the small developers.

This is not the main problem, the main problem is that these big companies don’t tell on what illegal dataset they train, unlike the open source efforts that can’t hide too much.

And they are not facing the “Creators”, they are facing the big copyright holders who convinced a fistful of artists that it is in their interest. If they win, that’s a huge step backward and gives just a few years to a dying business model. A lot of their arguments are misguided or downright deceitful.

olicvb@lemmy.ca · edit-2 2 years ago

Yea they exists, I think they are mostly merges, optimization, deviations, or tuning of the models released by Facebook or OpenAI and etc.

Based on what i know from image generator models, they take alot of computing to make/tune, but not as much to run it.

I was able to run this Mistral-7B model made by the Mistral team. And there are many others available here (these are re-released model that are tuned by a third party for use with GPT4All)

While this 7B model runs it definitely dosent give the same results as one from Big AI, I did manage to run (albeit very slowly 1Token/s) a model that is said to surpass GPT4 (https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0) made by the WizardLM team.

AI chatbots aside, there are also Image generation models created by small/indie dev teams, almost everything on civit.ai is made third party (so ALOT of models). Stable Diffusion has been keeping up and even in some cases surpassing the big guys mostly using indie dev work.

Mahlzeit@feddit.de · 2 years ago

It’s possible to run small AIs on gaming PCs. For Stable Diffusion and small LLMs (7, maybe 13B), a GPU with 4GB (or even 2GB?) VRAM is sufficient. A high-end gaming PC can also be used to modify them (ie make LoRas, etc.). Cloud computing is quite affordable, too.

Stable Diffusion, which had such an impact, reportedly cost only 600k USD to train. It should be possible to make a new one for a fraction of that today. Training MPT-7B cost MosaicML reportedly 200k USD. Far from hobbyist money, but not big business, either.

Mahlzeit@feddit.de · edit-2 2 years ago

I think a big picture view makes the problem clearer.

Licensing material means that you must pay the owner of some intellectual property. If we expand copyright to require licensing for AI training, then that means that the owners can demand more money for no additional work.

Where does the wealth come from that flows to the owners? It comes from the people who work. There is nowhere else it could possibly come from.

That has some implications.

Research and development progress slower because, not only do we have to work on improving things, but also to pay off property owners who contribute nothing. If you zoom in from the big picture view, you find that this is where small devs and open source suffer. They have to pay or create their own, new datasets; extra work for no extra benefit.

It also means that inequality increases. The extra cash flow means that more income goes to certain property owners.