Best LLMs for PC/Tech Troubleshooting?

FrankLaskey@lemmy.ml · 2 days ago

Best LLMs for PC/Tech Troubleshooting?

FrankLaskey@lemmy.ml · 2 days ago

The coder model (480B). I initially mistakenly said the 235b one but edited that. I didn’t know you could customize quant on OpenRouter (and I thought the differences between most modern 4 bit quants and 8-bit was minimal as well…) I have tried GPT OSS 120 a bunch of time and though it seems quote unquote ‘intelligent’ enough it is just too talkative and verbose for me (plus I can’t remember the last time it responded without somehow working an elaborate comparison table into the response) and it makes it too hard to parse through things.

afk_strats@lemmy.world · 1 day ago

Totally. I think OSS is outright annoying with its verbosity. A system prompt will get around that

FrankLaskey@lemmy.ml · 1 day ago

I tried that! I literally told it to be concise and to limit its response to a certain number of words unless strictly necessary and it seemed to completely ignore both.

afk_strats@lemmy.world · 6 hours ago

I don’t know if this is still useful for you, but I tried this out, mostly because I wanted to make sure I wasn’t crazy. Here’s my gpt-oss setup running on cheap AMD Instinct VRAM:

./llama-server \
  --model {model}.gguf
  --alias "gpt-oss-120b-mxfp4" \
  --threads 16 \
  -fa on\
  --main-gpu 0 \
  --ctx-size 64000 \
  --n-cpu-moe 0 \
  --n-gpu-layers 999 \
  --temp 1.0 \
  -ub 1536 \
  -b 1536 \
  --min-p 0.0 \
  --top-p 1.0 \
  --top-k 0.0 \
  --jinja \
  --host 0.0.0.0 \
  --port 11343 \
  --chat-template-kwargs '{"reasoning_effort": "medium"}'

I trimmed the content because it wasn’t relevant but left roughly the shape of the replies to give a sense of the verbosity.

Test 1: With default system message

user prompt: how do i calculate softmax in python

What is softmax
1 python + numpy
...
quick demo
...
2 SciPy
...
...
...
8 full script
...
running the script
...
results
...
TL;DR
...

followup prompt: how can i GPU-accelerate the function with torch

1 why pytorch is fast
...
...
**[Headers 2,3,4,5,6,7,8,9]**
...
...
TL;DR
...
Recap
...
Table Recap
...
Common pitfalls
...
Going beyond float32
...
10 Summary
...

Overall 6393 Tokens including reasoning

TEST 2 with this system prompt: You are a helpful coding assistant. Provide concise answers, to-the point answers. No fluff. Provide straightforward explanations when necessary. Do not add emoji and only provide tl;drs or summaries when asked.

user prompt: how do i calculate softmax in python

Softmax calculation in Python
...
Key points
...

followup prompt: how can i GPU-accelerate the function with torch

GPU‑accelerated Softmax with PyTorch
...
What the code does
...
Tips for larger workloads
...

Overall 1103 Tokens including reasoning

FrankLaskey@lemmy.ml · 6 hours ago

Thanks. I may give an updated system prompt like this a shot. Not sure where mine went wrong other than maybe it wasn’t being honored or seen by OpenRouter (I’m not running 120b locally, it’s too large for my set up). I’m actually a bit confused on how to set parameters with OpenRouter.