Best LLMs for PC/Tech Troubleshooting?

FrankLaskey@lemmy.ml · 1 day ago

Best LLMs for PC/Tech Troubleshooting?

brucethemoose@lemmy.world · edit-2 13 hours ago

Prompt formatting (and the system prompt) is a huge thing, especially with models trained for ‘tool use’ a specific way, so be sure to keep that in mind. For example, if you want a long chain of steps, be sure to explicitly ask (though Qwen is uses its thinking block quite gratuitously).

I find GLM 4.5’s default formatting to be really good though: be sure to give that a shot. It’s also awesome because the full 350B model (with some degredation) is locally runnable on a 128GB RAM + 24GB VRAM gaming rig, and the ‘Air’ version is quite fast and accurate on lesser hardware.

Local hosting, if you can swing it, is particularly nice because the calls are literally free, and promt ingestion is cached, so you can batch them and spam the heck out of them for testing and such.

FrankLaskey@lemmy.ml · 11 hours ago

Yes, I do local host several models. Mostly the Qwen3 family stuff like 30b a3b etc. Have been trying GLM 4.5 a bit through OpenRouter and I’ve been liking the style pretty well. Interesting to know I could just pop in some larger RAM dimms potentially and run even larger models locally. The thing is OR is so cheap for many of these models and with zero data retention policies I feel a bit stupid for even buying a 24 GB VRAM GPU to begin with.

brucethemoose@lemmy.world · 11 hours ago

Yeah, the APIs are super cheap. It doesn’t make a ton of sense unless you already have the GPU lying around.

With the right settings, GLM will actually work fine in 16GB, 12GB, or even 11GB VRAM + 128GB RAM. I can even make a custom quant if you want, since I already got that set up. 24 GB just gives it a bit of ‘breathing room’ for longer context and relaxed quantization for the dense parts of the model.

GLM Air will work on basically any modernish Nvidia GPU + like 26GB of free RAM. Its dense part is really small.

But to be clear, you have to get into the weeds to run them efficiently this way. There’s no simple ollama run here.