I find myself really appreciating what LLMs can do when it comes to help with software and tech support. I am a pretty adept PC power user who is not a programmer and (until recently) has only had a modest amount of experience with GNU/Linux. However, I have started to get into self-hosting my own FOSS apps and servers (started with OpenWebUI, now Jellyfin/Sonarr via Docker compose etc). I’m also reading a book about the Linux command line and trying to decipher the wold of black magic that is networking etc myself.
I have found that LLMs can really help with comprehension and troubleshooting. That said, lately I am struggling to get good troubleshooting advice out of my LLMs. Specifically, for troubleshooting docker container setups and networking issues.
I had been using Qwen3 Coder 480b, but tried out Claude Sonnet 4 recently and both have let me down a bit. They don’t seem to think systematically when offering troubleshooting tips (Qwen at least). I was hoping Claude would be better since it is an order of magnitude more expensive on OpenRouter, but so far it has not seemed so.
So, what LLM do you use for this type of work? Any other tips for using models as a resource for troubleshooting? I have been providing access to full logs etc and being as detailed as possible and still struggling to get good advice lately. I’m not talking full vibe coding here but just trying to figure out why my docker container is throwing errors etc. Thanks!
Note: I did search and found a somewhat similar post from 6 months ago or so but it wasn’t quite as specific and because 6 months is half a lifetime in LLM development, I figured I’d post as well. Here’s the post in question in case anyone is curious to see that one.
Claude is fine but you REALLY have to hold its hand in order to get any sort of decent solution. if you don’t 9 times out of 10 it’ll just make something up based purely on a forum or repo post. you have to tell it to provide sources so as to prevent the usual BS it’ll spew out.
I found once you hold it’s hand and scold it a few times it will provide decent solutions but then by that point you’ve essentially turned it into a fancy search engine.
Appreciate you sharing your experience. With this being the case and it being an order of magnitude more $$$ than Qwen3 coder, I think I’ll mostly steer clear for now. Not sure why this model seems to have such mindshare and dominance with programmers these days honestly. Other than many in the west seem somewhat biased against Chinese models.
Mainly because of Claude Code.
CC is better than the web based Claude especially when it comes to actual coding since it’s embedded with whatever project you’re working on.
Claude really excels when it’s right in the thick of it with you. Thus, again, you REALLY have to hold it’s hand. I personally don’t think it’s as great as others make it out to be.
What you really want is a locally hostable ‘researching’ front end that gets the LLM to go out and search the web for documentation. Without good context; they’re always ‘guessing’
I’m a little bit behind on these actually. But I do know Open Web UI’s research plugin has a bad reputation.
I definitely have been looking out for this for a while. Wanting to replicate GPT deep research but not seeing a great way to do this. I did see that there was a OWUI tool for this but it didn’t seem particularly battle-tested so I hadn’t checked it out yet. I’ve been curious about how the new Tongyi Deep Research might be…
That said, specifically for troubleshooting somewhat esoteric (or at least quite bespoke in terms of configuration) software problems, I was hoping the larger coder focused models would have enough built-in knowledge to suss out the issues. Maybe I should be having them consistently augment their responses with web searches if this isn’t the case? I have not been clicking that button typically.
I do generally try to paste in or link as much of the documentation for whatever software I’m troubleshooting though.
Prompt formatting (and the system prompt) is a huge thing, especially with models trained for ‘tool use’ a specific way, so be sure to keep that in mind. For example, if you want a long chain of steps, be sure to explicitly ask (though Qwen is uses its thinking block quite gratuitously).
I find GLM 4.5’s default formatting to be really good though: be sure to give that a shot. It’s also awesome because the full 350B model (with some degredation) is locally runnable on a 128GB RAM + 24GB VRAM gaming rig, and the ‘Air’ version is quite fast and accurate on lesser hardware.
Local hosting, if you can swing it, is particularly nice because the calls are literally free, and promt ingestion is cached, so you can batch them and spam the heck out of them for testing and such.
Yes, I do local host several models. Mostly the Qwen3 family stuff like 30b a3b etc. Have been trying GLM 4.5 a bit through OpenRouter and I’ve been liking the style pretty well. Interesting to know I could just pop in some larger RAM dimms potentially and run even larger models locally. The thing is OR is so cheap for many of these models and with zero data retention policies I feel a bit stupid for even buying a 24 GB VRAM GPU to begin with.
Yeah, the APIs are super cheap. It doesn’t make a ton of sense unless you already have the GPU lying around.
With the right settings, GLM will actually work fine in 16GB, 12GB, or even 11GB VRAM + 128GB RAM. I can even make a custom quant if you want, since I already got that set up. 24 GB just gives it a bit of ‘breathing room’ for longer context and relaxed quantization for the dense parts of the model.
GLM Air will work on basically any modernish Nvidia GPU + like 26GB of free RAM. Its dense part is really small.
But to be clear, you have to get into the weeds to run them efficiently this way. There’s no simple
ollama run
here.
I’m surprised you’re getting disappointing results with Qwen 3 Coder 480b. I run Qwen 2.5 coder 14b locally (Open WebUI + Ollama) on my 3060 12gb and I’ve been pretty pleased with it’s answers so far relating to python code, Django documentation/settings, and quirks with my reverse proxy.
I assume you aren’t hosting the 480b locally right? Are you using Open WebUI and an Open API key?
Honestly it has been good enough until recently when I’ve been struggling specifically with docker networking stuff and it’s been on the struggle bus with that. Yes, I’m using OpenRouter via OpenWebUI. I used to run a lot of stuff locally (mostly 4-b it quant 32b and smaller since I only have a single 3090) but lately I’ve been trying more larger models out on OpenRouter since many of the non proprietary ones are super cheap. Like fractions of a penny for a response… Many are totally free to a point as well.
I lately tried ChatGPT for some networking stuff. And occasionally I’ll use AIstudio (Google) for similar things. And let’s say they’re all not great. They can do the relatively common (and somewhat easy) Linux stuff, I think they should be able to tell you how to manage your Docker containers and volumes at the command line. But I had GhatGPT massively struggle with networking. And like SystemD service files had problematic stuff in them… So, my local LLMs are way to tiny to try. But there might just not be any properly good AI out there as of today. And their “reasoning” modes aren’t like human reasoning or systematic approaches either. They just make up a lot of stuff and that makes them a bit better, it’s not logic though. What I end up doing is either fall back to my own brain, learn the stuff and do it myself. Or something alike “vibe-coding”… Ask it 10-20 times, scold it, put in the error messages and eventually I’ll get something that runs.
Btw, there’s still a human Linux community around. So maybe find your favorite Linux forum and ask there once it gets too complicated for AI.
Qwen 3 or Qwen 3 Coder? Qwen3 comes in a 235B, 30B and smaller sizes. Qwen 3 Coder comes in a 30B or 480B size.
Open Router has multiple quant options and, for coding, I’d try to only use 8bit int or higher.
Claude also has a ton of sizes and deployment options with different capabilities.
As far as reasoning, the newest Deepseek V3.1 Terminus should be pretty good.
Honestly, all of these models should be able to help you up to a certain level with docker. I would double check how you connect to open router, making sure your hyperparams are good, making sure thinking/reasoning is enabled. Maybe try duck.ai and see if the models there are matching up to whatever you’re doing in open router.
Finally, not being a hater, but LLMs are not intelligent. They cannot actually reason or think. They can probabilistically align with answers you want to see. Sometimes your issue might be too weird or new for them to be able to give you a good answer. Even today models will give you docker compose files with a version number at the top, a feature which has been deprecated for over a year.
Edit: gpt-oss 120 should be cheap and capable enough. Available on duck.ai
The coder model (480B). I initially mistakenly said the 235b one but edited that. I didn’t know you could customize quant on OpenRouter (and I thought the differences between most modern 4 bit quants and 8-bit was minimal as well…) I have tried GPT OSS 120 a bunch of time and though it seems quote unquote ‘intelligent’ enough it is just too talkative and verbose for me (plus I can’t remember the last time it responded without somehow working an elaborate comparison table into the response) and it makes it too hard to parse through things.
Totally. I think OSS is outright annoying with its verbosity. A system prompt will get around that
I tried that! I literally told it to be concise and to limit its response to a certain number of words unless strictly necessary and it seemed to completely ignore both.
I think it’s more important how you run it.
I have copilot in vscode and since I use it to ssh into things, the bot has access to all the files and my terminal output. It’s also easy to switch from one model to an other.
Even if this isn’t going to solve the issue of the quality of the LLM’s advice and help, it would massively simplify my current workflow which is copy/pasting logs and command responses and everything into the OWUI window. I’ll check it out. Can you use OpenRouter with VSCode to have access to more models or?
Yup, open router is one of the options as well as ollama and all the major APIs.
I pay 10$ a month so I get unlimited chatgpt4.1, 5 mini and grok. I also have openai and gemini through API.
Surprisingly, grok feels the best because it tends to make small changes at a time and will verify by running your scripts if you let it. It picks up on it’s own mistakes way more often, and it’s also fast. Not the smartest but definitely the funnest.
You can probably get similar behavior by modifying the prompts for the other ones.
Is this Grok Code fast 1? I’ve noticed it’s hitting tops on OR for programming as of recently. I was going to try it out but it won’t respect my zero data retention preference unsurprisingly.
Yup, precisely. It’s easily the best free model on the 10$ a month plan.