Managers

inari@piefed.zip · 2 months ago

Managers

AtHeartEngineer@lemmy.world · 2 months ago

deleted by creator

theunknownmuncher@lemmy.world · 2 months ago

Qwen3.6 27b beats Claude Opus 4.5 in most benchmarks. Qwen3.6 35b beats Opus 4.5 in a few specific benchmarks, but most benchmarks have Opus 4.5 beating Qwen3.6 35b, although there is not a big gap between Opus 4.5 and Qwen3.6 27b or 35b either way.

AtHeartEngineer@lemmy.world · 2 months ago

deleted by creator

theunknownmuncher@lemmy.world · 2 months ago

https://github.com/QwenLM/Qwen3.6#benchmarks

AtHeartEngineer@lemmy.world · edit-2 2 months ago

deleted by creator

theunknownmuncher@lemmy.world · 2 months ago

“I don’t think any of that is true. show me data” is shown data “I won’t accept that data!” Lol. Lmao even.

Yeah, I’m not going to play this game of trying to anticipate which numbers you’re willing to accept and which you aren’t. You have just as equal access to a search engine as I have. All of the results I have seen align with the numbers that Qwen released and are well within margins of error.

This model’s release caused such a stir and was a big deal due to the fact that it reproducibly meets or beats Claude Opus 4.5 while being locally runnable. If you won’t believe it, okay, I don’t care. 🤷

AtHeartEngineer@lemmy.world · 2 months ago

deleted by creator

theunknownmuncher@lemmy.world · 2 months ago

I run 27b at q8 with unquantized KV cache and 256k context on two Instinct MI60 GPUs. Definitely the best model that I have been able to run locally at a reasonable speed. 35b generates tokens as fast as you’d expect from any cloud provider. 27b is slower than 35b, of course, but token generation is still faster than my reading speed and suitable with coding agents.

AtHeartEngineer@lemmy.world · 2 months ago

deleted by creator

AtHeartEngineer@lemmy.world · 2 months ago

deleted by creator

theunknownmuncher@lemmy.world · 2 months ago

It’s not like the Qwen team hasn’t already built a lot of trust with the community. They’ve never been misleading with previous releases, the “marketing material” (🙄) is for a free product, so they have no incentive to lie, and it would be extra stupid because anyone can run the benchmarks and verify their numbers independently anyway. What would be the point?

lime!@feddit.nu · 2 months ago

we were talking about 3.6.

deepseek distilled is an alternative that works on more modest hardware.

and i’m not really interested in what claude and chatgpt, mistral and the others are doing, i would never tuch those models with a ten foot pole. if i can’t run it it does not get run.