And I imagine that must be true for neural networks too, as that layer of language processing on top of any task naturally can’t be as efficient/performatative as specialist software/networks made for the job.
Oh yeah definitely, a specialized model for each task would be more efficient on the inference side but can you imagine the cost of training a million specialized models ? For example you could think of natural language processing as it was done before : one model for sentiment analysis, one model for chronological analysis, one model for identifying legal terms etc… need to classify color descriptions in natural language ? Well here you go train another model. A small model (comparatively) but also one you’ll have to re-train if you want to change the task even slightly.
A LLM has the advantage of being able to generalize a lot of different tasks on the same model, including some that are wildly out of distribution (meaning you hadn’t even thought of them and they are not explicitly stated in the training data). So yeah, you pay a big training tax to train one large model, but then it pays off because that same model can perform on a million different tasks.
At least that’s the thesis. I’m not qualified to judge whether it is proving worth it, but that’s the reason why the industry massively shifted towards LLMs.
Oh yeah definitely, a specialized model for each task would be more efficient on the inference side but can you imagine the cost of training a million specialized models ? For example you could think of natural language processing as it was done before : one model for sentiment analysis, one model for chronological analysis, one model for identifying legal terms etc… need to classify color descriptions in natural language ? Well here you go train another model. A small model (comparatively) but also one you’ll have to re-train if you want to change the task even slightly.
A LLM has the advantage of being able to generalize a lot of different tasks on the same model, including some that are wildly out of distribution (meaning you hadn’t even thought of them and they are not explicitly stated in the training data). So yeah, you pay a big training tax to train one large model, but then it pays off because that same model can perform on a million different tasks.
At least that’s the thesis. I’m not qualified to judge whether it is proving worth it, but that’s the reason why the industry massively shifted towards LLMs.