Skip to content

Misc. bug: [llama.android] Model keeps replying and cannot be stopped normally until it exceeds the context #11264

@codezjx

Description

@codezjx

Name and Version

The latest b4491 version with llama.android example

Operating systems

Other? (Please let us know in description)

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

No response

Problem description & steps to reproduce

  1. Run llama.android example
  2. Loaded with SmolLM2 model
  3. Send with chat template (User message: Tell a joke), the template are generated by the common_chat_apply_template() method

Code location: llama.cpp/examples/llama.android/app/src/main/java/com/example/llama/MainViewModel.kt

val smollm2msg = "<|im_start|>system\n" +
    "You are a helpful AI assistant<|im_end|>\n" +
    "<|im_start|>user\n" +
    "$text<|im_end|>\n" +
    "<|im_start|>assistant\n"
viewModelScope.launch {
    llamaAndroid.send(smollm2msg)
        .catch {
            Log.e(tag, "send() failed", it)
            messages += it.message!!
        }
        .collect { messages = messages.dropLast(1) + (messages.last() + it) }
}
  1. Issue reproduce, the model keeps replying and cannot be stopped normally until it exceeds the context, and it will contain tokens such as <|im_start|> and <|im_end|>

image

Comparison test:

This problem cannot be reproduced using the command line program llama-cli with the same LLM model

./llama-cli -m models/smollm2-360m-instruct-q8_0.gguf -p "You are a helpful assistant" -cnv

First Bad Commit

Since the establishment of llama.android

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions