Skip to content

Conversation

@aldehir
Copy link
Collaborator

@aldehir aldehir commented Aug 5, 2025

This PR adds reasoning and tool parsing to the Llama 3.x Nemotron models.

Context:

  • The generic parser excludes the <think></think> tags, and I believe the sampling is also inhibiting the model from reasoning.
  • The think tags are not unique tokens, so it outputs tokens <think when streaming until the reasoning parsing has enough to match.

Implementation:

  • Added COMMON_CHAT_FORMAT_LLAMA_3_X_NEMOTRON and associated init/parse functions.
  • Added try_consume_partial_literal() to defer parsing when there's a prefix match, but I don't know if that's the best way. Could be named better too.

@github-actions github-actions bot added the testing Everything test related label Aug 5, 2025
@aldehir aldehir force-pushed the model/llama-nemotron-reasoning branch from 95f4c09 to 969368a Compare August 7, 2025 00:54
@aldehir aldehir marked this pull request as draft August 9, 2025 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant