Should be able to load the full version of DeepSeek R1 on this no prob 😎😎

cm0002@lemmy.world · 6 days ago

Should be able to load the full version of DeepSeek R1 on this no prob 😎😎

hoshikarakitaridia@lemmy.world · 6 days ago

This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, …) is there a cheaper alternative to flagship Nvidia cards which can do it?

kata1yst@sh.itjust.works · edit-2 6 days ago

Depends on your goals. For raw tokens per second, yeah you want an Nvidia card with enough^™ memory for your target model(s).

But if you don’t care so much for speed beyond a certain amount, or you’re okay sacrificing some speed for economy, AMD RX7900 XT/XTX or 9070 both work pretty well for small to mid sized local models.

Otherwise you can look at the SOC type solutions like AMD Strix Halo or Nvidia DGX for more model size at the cost of speed, but always look for reputable benchmarks showing ‘enough’ speed for your use case.

cm0002@lemmy.world · 6 days ago

From my reading, if you don’t mind sacrificing speed (tokens/sec), you can run models in system RAM. To be usable though, you’d need at a minimum a dual proc server/workstation for multichannel RAM and enough RAM to fit the model

So for something like DS R1, you’d need like >512GB RAM

SmokeyDope@lemmy.world · edit-2 4 days ago

You are correct in your understanding. However the last part of your comment needs a big asterisk. Its important to consider quantization.

The full f16 deepseek r1 gguf from unsloth requires 1.34tb of ram. Good luck getting the ram sticks and channels for that.

The q4_km mid range quant is 404gb which would theoretically fit inside 512gb of ram with leftover room for context.

512gb of ram is still a lot, theoretical you could run a lower quant of r1 with 256gb of ram. Not super desirable but totally doable.

SmokeyDope@lemmy.world · edit-2 6 days ago

Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.

Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.

LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.

Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.

Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.

icecreamtaco@lemmy.world · edit-2 6 days ago

Assuming you haven’t ruled this out already, test your plans out now using whatever computer you already own. At the hobbyist level you can do a lot with 8GB ram and no graphics card. 7B LLMs are really good now and they’re only going to get better.