Seems there’s not a lot of talk about relatively unknown finetunes these days, so I’ll start posting more!
Openbuddy’s been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy’s dataset, apparently with QAT applied (though it’s kinda unclear) and context-extended. Observations:
-
Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
-
Seems good at long context. Have not tested 200K, but it’s quite excellent in the 64K range.
-
Works fine in English.
-
The chat template is funky. It seems to mix up the <think> and <|think|> tags in particular (why don’t they just use ChatML?), and needs some wrangling with your own template.
-
Seems smart, can’t say if it’s better or worse than QwQ yet, other than it doesn’t seem to “suffer” below 3.75bpw like QwQ does.
Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?
deleted by creator