Replies: 2 comments 1 reply
-
I am quite interested. from my own testing, the speed is great, they are generally better at the GLSL programing I do than other models. It has a set of internal trained guidelines it sees as the system layer, then the system prompt is the Developer layer. It will comply with just about any valid reasonable request I have thrown at it for testing. it's just very strict out of the box, and needs a prompt that doesn't raise alarmbells. Do that and it is not that bad. I will be using it to explore certain hallucination patterns in LLMs. I finally have a decent western Open Weight MoE at a reasonable size I can compare to the chinese offerings. it is also specifially trained to use a decent search tool, and a python tool that openai gave us example implementation code for. could actually end up being a good option for a lot of queries when it has those tools acessible. it is running at 15-20 tps on my rig, which is a good step up from the 10-15 I get with GLM 4.5 Air. |
Beta Was this translation helpful? Give feedback.
-
I think GPT-OSS is worth implementing. On aider's leaderboard, the 120B version performs as good as Qwen3-32b. With 5b active parameters it sounds good for CPU inference. The only problem would be RAM usage. On upstream it seems that some of the tensors don't quantize well. Q2 and Q8 are both around 64gb. Hopefully that problem gets fixed so that 120B is runnable in 64gb of RAM. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
llama.cpp
had it implemented by the time the models were announced.But I see mixed reactions around the Internet, ranging from the exuberant excitement (OMG, we will never need another model again) to outright dismissal (Meh, there are other much better open-weight models).
I looked at the mainline PR, and it seems there aren't any real showstoppers. But the change is quite large, so I'm wondering if
ik_llama.cpp
users are interested in using these models. Also, perhaps someone is already working on a port?Beta Was this translation helpful? Give feedback.
All reactions