Skip to content

Strange cache behavior with 0.31.0 in server mode #975

@putnam

Description

@putnam

I'm using mlx-lm 0.31.0 on a Mac Studio M3 Ultra 512GB, with mlx-community/GLM-5-MXFP4-Q8.

I'm running the server like so:
mlx_lm.server --model mlx-community/GLM-5-MXFP4-Q8 --host 10.11.11.15 --port 11434 --temp 1.0 --max-tokens 8192 --log-level DEBUG

I have a static system prompt that's a few paragraphs long. Thinking is enabled. While reproing this I am making sure no concurrent requests are happening to avoid confusion with #965 which I can also reproduce reliably (n.b. this is not new in 0.31.0 though).

When I carry out the very first conversation, things behave as expected. After clearing my client's chat history and starting a new conversation, odd things begin to occur.

Although I may just say "hi" with my first message, I can see in the think block that the model thinks I just wrote "junkjunk hi" (or similar) where "junkjunk" is usually something that was said in a previous conversation or even a token from the system prompt. It is pretty easy to repro by just talking to it, clearing the conversation, then talking some more, and eventually the model will respond to things/strings you didn't say that ended up tacked onto the system prompt or prepended to your first message, however you want to look at it. So far in my testing it seems to be a small amount of data, usually a single token, but I have not tried various lengths of conversations yet.

I have confirmed I'm not sending these strings/tokens from the client when the conversation is reset. When I clear the history, the messages object is just the system prompt, then "hi" for the first message.

This problem did not occur on the previous release; my first thought is this has something to do with the new cache behaviors introduced in 0.31.0.

Thanks for your work on this very useful project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions