Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.
Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.
LLM wise Your phone can run 1-4b models,
Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.
Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.
Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.
Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.
Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.
LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.
Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.
Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.