Skip to content

bug: tool call sometimes not parsed correctly?Β #471

@Ricki-BumbleDev

Description

@Ricki-BumbleDev

Issue description

Sometimes the <tool_call>...</tool_call> stuff just gets returned in the response instead of the tool call actually being executed (using Qwen3-14B).

Expected Behavior

Tool call is executed.

Actual Behavior

Tool call is just returned in the response like so:

<tool_call>
{"name": "createTicket", "arguments": {"subject": "...", "description": "...", "assignedTo": "...."}}
</tool_call>

Steps to reproduce

import { defineChatSessionFunction, getLlama, LlamaChatSession, resolveModelFile } from 'node-llama-cpp';
import fs from 'node:fs/promises';

const modelName = 'Qwen3-14B';
const modelPath = await resolveModelFile(`hf:unsloth/${modelName}-GGUF:Q4_K_M`);
const llama = await getLlama();
const model = await llama.loadModel({ modelPath });
const context = await model.createContext();
const systemPrompt = await fs.readFile('./systemPrompt.txt', 'utf8');
const session = new LlamaChatSession({ contextSequence: context.getSequence(), systemPrompt });

const createTicket = defineChatSessionFunction({
  description: 'Creates a new ticket in the project management system',
  params: {
    type: 'object',
    properties: {
      subject: {
        type: 'string',
        description: 'Clear and descriptive subject for the ticket'
      },
      description: {
        type: 'string',
        description: 'Detailed description of what needs to be done'
      },
      assignedTo: {
        enum: ['tech', 'design'],
        description: 'Team to assign the ticket to (tech or design)'
      }
    },
    required: ['subject', 'description', 'assignedTo']
  },
  handler: async params => {
    // Creates the ticket
  }
});

const userInstructions = await fs.readFile('./userInstructions.txt', 'utf8');
const response = await session.prompt(userInstructions, { functions: { createTicket } });
console.log('AI Response:', response);

My Environment

Dependency Version
Operating System macOS 24.4.0
CPU Apple M2 Max
Node.js version 22.13.1
Typescript version TS not specifically installed, using node's --experimental-strip-types
node-llama-cpp version 3.9.0

npx --yes node-llama-cpp inspect gpu output:

OS: macOS 24.4.0 (arm64)
Node: 22.13.1 (arm64)
node-llama-cpp: 3.9.0

Metal: available

Metal device: Apple M2 Max
Metal used VRAM: 0% (64KB/21.33GB)
Metal free VRAM: 99.99% (21.33GB/21.33GB)
Metal unified memory: 21.33GB (100%)

CPU model: Apple M2 Max
Math cores: 8
Used RAM: 64.54% (20.65GB/32GB)
Free RAM: 35.45% (11.35GB/32GB)
Used swap: 0% (0B/0B)
Max swap size: dynamic
mmap: supported

Additional Context

I believe this has to with when the <tool_call> instruction is right at the start of the response. It looks like for subsequest tool calls it's working.

Relevant Features Used

  • Metal support
  • CUDA support
  • Vulkan support
  • Grammar
  • Function calling

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions