model : add reasoning/tool parsing to Llama 3.x Nemotron #15083

aldehir · 2025-08-05T07:04:17Z

This PR adds reasoning and tool parsing to the Llama 3.x Nemotron models.

Context:

The generic parser excludes the <think></think> tags, and I believe the sampling is also inhibiting the model from reasoning.
The think tags are not unique tokens, so it outputs tokens <think when streaming until the reasoning parsing has enough to match.

Implementation:

Added COMMON_CHAT_FORMAT_LLAMA_3_X_NEMOTRON and associated init/parse functions.
Added try_consume_partial_literal() to defer parsing when there's a prefix match, but I don't know if that's the best way. Could be named better too.

github-actions bot added the testing Everything test related label Aug 5, 2025

model : add llama 3.x nemotron reasoning/tool parsing

969368a

aldehir force-pushed the model/llama-nemotron-reasoning branch from 95f4c09 to 969368a Compare August 7, 2025 00:54

aldehir marked this pull request as draft August 9, 2025 00:54

Provide feedback