I find myself really appreciating what LLMs can do when it comes to help with software and tech support. I am a pretty adept PC power user who is not a programmer and (until recently) has only had a modest amount of experience with GNU/Linux. However, I have started to get into self-hosting my own FOSS apps and servers (started with OpenWebUI, now Jellyfin/Sonarr via Docker compose etc). I’m also reading a book about the Linux command line and trying to decipher the wold of black magic that is networking etc myself.
I have found that LLMs can really help with comprehension and troubleshooting. That said, lately I am struggling to get good troubleshooting advice out of my LLMs. Specifically, for troubleshooting docker container setups and networking issues.
I had been using Qwen3 Coder 480b, but tried out Claude Sonnet 4 recently and both have let me down a bit. They don’t seem to think systematically when offering troubleshooting tips (Qwen at least). I was hoping Claude would be better since it is an order of magnitude more expensive on OpenRouter, but so far it has not seemed so.
So, what LLM do you use for this type of work? Any other tips for using models as a resource for troubleshooting? I have been providing access to full logs etc and being as detailed as possible and still struggling to get good advice lately. I’m not talking full vibe coding here but just trying to figure out why my docker container is throwing errors etc. Thanks!
Note: I did search and found a somewhat similar post from 6 months ago or so but it wasn’t quite as specific and because 6 months is half a lifetime in LLM development, I figured I’d post as well. Here’s the post in question in case anyone is curious to see that one.
Prompt formatting (and the system prompt) is a huge thing, especially with models trained for ‘tool use’ a specific way, so be sure to keep that in mind. For example, if you want a long chain of steps, be sure to explicitly ask (though Qwen is uses its thinking block quite gratuitously).
I find GLM 4.5’s default formatting to be really good though: be sure to give that a shot. It’s also awesome because the full 350B model (with some degredation) is locally runnable on a 128GB RAM + 24GB VRAM gaming rig, and the ‘Air’ version is quite fast and accurate on lesser hardware.
Local hosting, if you can swing it, is particularly nice because the calls are literally free, and promt ingestion is cached, so you can batch them and spam the heck out of them for testing and such.
Yes, I do local host several models. Mostly the Qwen3 family stuff like 30b a3b etc. Have been trying GLM 4.5 a bit through OpenRouter and I’ve been liking the style pretty well. Interesting to know I could just pop in some larger RAM dimms potentially and run even larger models locally. The thing is OR is so cheap for many of these models and with zero data retention policies I feel a bit stupid for even buying a 24 GB VRAM GPU to begin with.
Yeah, the APIs are super cheap. It doesn’t make a ton of sense unless you already have the GPU lying around.
With the right settings, GLM will actually work fine in 16GB, 12GB, or even 11GB VRAM + 128GB RAM. I can even make a custom quant if you want, since I already got that set up. 24 GB just gives it a bit of ‘breathing room’ for longer context and relaxed quantization for the dense parts of the model.
GLM Air will work on basically any modernish Nvidia GPU + like 26GB of free RAM. Its dense part is really small.
But to be clear, you have to get into the weeds to run them efficiently this way. There’s no simple
ollama run
here.