Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training

Seems there’s not a lot of talk about relatively unknown finetunes these days, so I’ll start posting more!

Openbuddy’s been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy’s dataset, apparently with QAT applied (though it’s kinda unclear) and context-extended. Observations:

Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
Seems good at long context. Have not tested 200K, but it’s quite excellent in the 64K range.
Works fine in English.
The chat template is funky. It seems to mix up the <think> and <|think|> tags in particular (why don’t they just use ChatML?), and needs some wrangling with your own template.
Seems smart, can’t say if it’s better or worse than QwQ yet, other than it doesn’t seem to “suffer” below 3.75bpw like QwQ does.

Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?

Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training

Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training

OpenBuddy/openbuddy-qwq-32b-v25.2q-200k · Hugging Face