Add proper token counting to code execution model #1184

i-vainn · 2026-01-26T16:59:03Z

Changes

get_code_execution_model now passes require_tokenizer=True to ensure tokenizer availability
CodeExecutionWrapper now uses tokenizer to count tokens for:
- Code execution output added to prompts
- Streaming segment token counts
Added RuntimeError with clear message if tokenizer is unavailable
Fixed _initialize_tokenizer to catch exceptions and allow fallback to server endpoint

Summary by CodeRabbit

Release Notes

Bug Fixes
- Enhanced tokenizer initialization with improved error handling and fallback mechanisms
- Improved token counting accuracy during code execution and generation to ensure proper token budget tracking
- Added runtime validation to ensure tokenizer availability during generation processes

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: i-vainn <[email protected]>

coderabbitai · 2026-01-26T17:01:35Z

📝 Walkthrough

Walkthrough

The changes add a require_tokenizer parameter to enforce tokenizer initialization at model construction time and implement comprehensive token counting for manually injected tokens during code execution, with runtime validation to ensure tokenizer availability.

Changes

Cohort / File(s)	Summary
Model initialization configuration `nemo_skills/inference/model/__init__.py`, `nemo_skills/inference/model/base.py`	Added `require_tokenizer: bool = False` parameter to `BaseModel.__init__` to conditionally enforce tokenizer initialization. Modified tokenizer initialization logic to trigger when `enable_soft_fail or require_tokenizer` is true. Enhanced `_initialize_tokenizer` with try/except error handling and logging. Updated `get_code_execution_model` to pass `require_tokenizer=True` when constructing models.
Token accounting in code execution `nemo_skills/inference/model/code_execution.py`	Introduced token counting for manually injected tokens (`code_end`, `code_output`, `formatted_code_output`) in both streaming and non-streaming generation paths. Added runtime validation requiring tokenizer availability before token counting operations, raising `RuntimeError` if tokenizer is missing. Reordered token budget updates and code block execution logic to perform token accounting before break conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the main change: adding proper token counting to the code execution model. This is the primary objective across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-01-26T17:04:53Z

Greptile Overview

Greptile Summary

This PR adds proper token counting to the code execution model to ensure accurate token budget tracking during code generation and execution cycles.

Key changes:

The get_code_execution_model function now requests tokenizer availability
CodeExecutionWrapper uses the tokenizer to count tokens for both generated code segments and execution outputs
Added runtime validation to fail fast if tokenizer is unavailable
Improved error handling in tokenizer setup to catch exceptions and allow fallback to server endpoint
Token budget now accurately accounts for manually added code end markers and code execution outputs

The implementation addresses the previous limitation where code execution outputs were not counted toward the token budget (as noted in the removed TODO comment).

Confidence Score: 4/5

This PR is safe to merge with minor considerations around tokenization accuracy
The changes are well-structured and address a real issue with token counting. The score is 4/5 rather than 5/5 because: (1) the tokenizer.encode() approach may have context-dependent differences from actual LLM token usage (as noted in previous review), and (2) the exception handling catches generic Exception which is quite broad, though this is acceptable for fallback logic
No files require special attention - all changes are straightforward improvements to token counting

Important Files Changed

Filename	Overview
nemo_skills/inference/model/base.py	Added `require_tokenizer` parameter and improved error handling in `_initialize_tokenizer` with try/catch for OSError and generic exceptions
nemo_skills/inference/model/code_execution.py	Implemented proper token counting for code execution outputs and streaming segments using tokenizer.encode(), added runtime validation, and fixed token budget tracking
nemo_skills/inference/model/init.py	Added require_tokenizer parameter to get_code_execution_model to ensure tokenizer initialization for token counting

Sequence Diagram

sequenceDiagram
    participant Client
    participant CodeExecModel
    participant BaseModel
    participant TokenCounter
    participant LLMServer

    Client->>CodeExecModel: get_code_execution_model()
    CodeExecModel->>BaseModel: construct with require_tokenizer
    BaseModel->>BaseModel: check require_tokenizer setting
    BaseModel->>TokenCounter: setup tokenizer
    TokenCounter-->>BaseModel: tokenizer ready
    BaseModel-->>CodeExecModel: model ready
    CodeExecModel->>CodeExecModel: verify tokenizer available
    CodeExecModel-->>Client: wrapper ready
    
    Client->>CodeExecModel: generate_async()
    loop Code execution rounds
        CodeExecModel->>LLMServer: request generation
        LLMServer-->>CodeExecModel: text segment
        CodeExecModel->>TokenCounter: count segment tokens
        TokenCounter-->>CodeExecModel: token count
        CodeExecModel->>CodeExecModel: update budget
        CodeExecModel->>CodeExecModel: run code
        CodeExecModel->>TokenCounter: count output tokens
        TokenCounter-->>CodeExecModel: token count
        CodeExecModel->>CodeExecModel: update budget
    end
    CodeExecModel-->>Client: final result

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-26T17:04:57Z

nemo_skills/inference/model/code_execution.py

                    yield {"generation": code_end}

+                # Calculate token count for this segment (after adding code_end if needed)
+                num_generated_tokens = len(self.model.tokenizer.encode(current_output_segment))


Token count uses encode() on the segment string, which may differ from actual token count if tokenization is context-dependent. Consider verifying this matches the LLM's actual token usage.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_skills/inference/model/base.py`:
- Around line 207-211: The try/except in the tokenizer initialization around
WrapperAutoTokenizer(tokenizer) is catching a broad Exception; narrow it to the
documented failure modes by catching OSError, ValueError and ImportError
specifically (instead of Exception) so failures from
AutoTokenizer.from_pretrained()/WrapperAutoTokenizer are handled but other
unexpected exceptions still surface; update the except clause accordingly in the
code that constructs WrapperAutoTokenizer(tokenizer).

In `@nemo_skills/inference/model/code_execution.py`:
- Around line 198-201: After you decrement request["tokens_to_generate"] by
code_output_tokens (computed via self.model.tokenizer.encode(code_output)),
immediately guard against a non‑positive budget: if
request["tokens_to_generate"] <= 0 then stop further generation (break/return
from the generation loop) and avoid calling the model with a negative
tokens_to_generate; apply the same guard in the other identical path around
lines 431-434 so both code_output_tokens paths early‑exit when budget is
exhausted and still update total_num_generated_tokens.

coderabbitai · 2026-01-26T17:08:08Z

nemo_skills/inference/model/base.py

+            try:
+                return WrapperAutoTokenizer(tokenizer)
+            except (OSError, Exception) as e:
+                LOG.debug(f"Failed to initialize tokenizer from '{tokenizer}': {e}")
+                return None


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code at lines 207-211 head -n 215 nemo_skills/inference/model/base.py | tail -n 20

Repository: NVIDIA-NeMo/Skills

Length of output: 801

🏁 Script executed:

# Find where AutoTokenizer comes from rg "AutoTokenizer" nemo_skills/inference/model/base.py -B 3 -A 3

Repository: NVIDIA-NeMo/Skills

Length of output: 1280

🏁 Script executed:

# Check the entire tokenizer initialization context sed -n '190,220p' nemo_skills/inference/model/base.py

Repository: NVIDIA-NeMo/Skills

Length of output: 1128

🏁 Script executed:

# Look for any documentation or requirements that specify Transformers version fd -e "txt" -e "in" -e "yaml" | xargs rg -l "transformers" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 162

🏁 Script executed:

# Find and examine WrapperAutoTokenizer implementation find . -name "utils.py" -path "*/inference/model/*" -exec cat {} \;

Repository: NVIDIA-NeMo/Skills

Length of output: 4062

🏁 Script executed:

# Check the main requirements to understand Transformers version cat requirements/main.txt | grep -i transformers

Repository: NVIDIA-NeMo/Skills

Length of output: 117

🏁 Script executed:

# Also check stem requirements cat requirements/stem.txt | grep -i transformers

Repository: NVIDIA-NeMo/Skills

Length of output: 74

🏁 Script executed:

# Search for documentation or examples of AutoTokenizer exceptions in the codebase rg "AutoTokenizer" -A 5 -B 5 | grep -i "except\|raise\|error" | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 1792

🏁 Script executed:

# Check if there are other try-except blocks handling tokenizer initialization elsewhere rg "from_pretrained" -A 3 -B 1 | grep -E "(except|OSError|ValueError|ImportError)"

Repository: NVIDIA-NeMo/Skills

Length of output: 358

🌐 Web query:

AutoTokenizer.from_pretrained exceptions transformers library

💡 Result:

Common AutoTokenizer.from_pretrained(...) exceptions in 🤗 Transformers, what they mean, and the usual fix:

OSError: We couldn't connect to 'https://huggingface.co' ... couldn't find it in the cached files ... not the path to a directory containing ... config.json
Cause: no internet / blocked Hub access, or you’re pointing to a local folder that doesn’t actually contain the needed tokenizer files.
Fix: ensure network access (or pre-download), or pass a correct local directory that contains tokenizer artifacts (e.g., tokenizer.json, tokenizer_config.json, vocab files). (discuss.huggingface.co)

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name' ...
Cause: you passed something that is not a Hub repo ID (often a file path like .../tf_model.h5, an S3 URI, or a malformed string).
Fix: pass either a valid Hub ID ("org/model") or a local folder path (directory), not a single weights file / invalid URI. (stackoverflow.com)

ValueError: Unrecognized model ... Should have a 'model_type' key in its config.json ...
Cause: the local folder/repo you’re loading is missing a proper config.json (or it lacks model_type), or you’re using an incompatible/old export.
Fix: verify the folder has a valid config.json with model_type (and that you saved/pushed the model/tokenizer correctly). (discuss.huggingface.co)

ValueError: Loading <repo> requires you to execute the configuration file in that repo ... set trust_remote_code=True
Cause: the model/tokenizer relies on custom Python code from the repo.
Fix: only if you trust the repo, pass trust_remote_code=True (and ensure you pass it everywhere needed in your loading flow). (discuss.huggingface.co)

Auth/token-related breakages (e.g., private/gated repos), including reports around token= behavior changing
Cause: missing/incorrect auth token, or a version-specific regression/change in how token is propagated during tokenizer loading.
Fix: confirm you’re logged in / passing a valid token; if it “suddenly” started failing after an upgrade, try pinning/adjusting transformers/huggingface_hub versions and retest. (github.com)

Citations:

1: https://discuss.huggingface.co/t/huggingface-offline-error/89170?utm_source=openai

2: https://stackoverflow.com/questions/76900349/repo-id-error-when-using-hugging-face-transformers?utm_source=openai

3: https://discuss.huggingface.co/t/assistance-needed-unrecognized-model-error-in-autotokenizer-from-pretrained-for-gemma-2-2b/140143?utm_source=openai

4: https://discuss.huggingface.co/t/how-to-use-trust-remote-code-true-with-load-checkpoint-and-dispatch/39849?utm_source=openai

5: AutoTokenizer.from_pretrained does not propagate token huggingface/transformers#39030

🏁 Script executed:

# Check how other parts of the codebase handle similar initialization errors rg "from_pretrained|WrapperAutoTokenizer" -B 3 -A 5 | head -50

Repository: NVIDIA-NeMo/Skills

Length of output: 3855

🏁 Script executed:

# Check if there are any HFValidationError imports or usages rg "HFValidationError|huggingface_hub" nemo_skills/ -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 1379

🌐 Web query:

transformers AutoTokenizer.from_pretrained exception types ValueError OSError ImportError

💡 Result:

transformers.AutoTokenizer.from_pretrained(...) (and the underlying tokenizer .from_pretrained(...)) commonly fails with these exception types:

OSError (historically also raised as EnvironmentError, which is an alias of OSError on Python 3): raised when the tokenizer files can’t be resolved/loaded (bad model id, wrong directory, missing required tokenizer files, etc.). The HF code explicitly does raise EnvironmentError("Can't load tokenizer for ..."). [1]

ValueError: raised for “logical” issues such as:

passing a single file/URL to tokenizers that require a directory/model id (explicit raise ValueError(...)). [1]

AutoTokenizer can’t map the model/config to a tokenizer (unrecognized configuration), or a required tokenizer implementation is unavailable (e.g., SentencePiece-backed slow tokenizer class is None so AutoTokenizer raises a ValueError instructing to install sentencepiece). [2]

fast tokenizer backend can’t be instantiated (no tokenizer.json, no slow tokenizer available to convert, etc.), raising a ValueError with the “Couldn't instantiate the backend tokenizer…” message. [3]

ImportError: can be raised indirectly when the load path requires optional deps that aren’t installed (most often during slow→fast conversion), e.g. requires_protobuf(...) raising ImportError: ... requires the protobuf library .... [4]

Sources: [1] [2] [3] [4]

Narrow exception handling for tokenizer initialization.
Catching broad Exception can mask unexpected bugs and silently degrade; replace with specific exception types that AutoTokenizer.from_pretrained() raises.

🛠️ Proposed refinement

- except (OSError, Exception) as e: + except (OSError, ValueError, ImportError) as e: LOG.debug(f"Failed to initialize tokenizer from '{tokenizer}': {e}") return None

AutoTokenizer.from_pretrained() raises OSError (missing files, network issues), ValueError (unrecognized model config, missing dependencies), and ImportError (optional dep not installed). These three types cover the documented failure modes for tokenizer loading.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

return WrapperAutoTokenizer(tokenizer)

except (OSError, Exception) as e:

LOG.debug(f"Failed to initialize tokenizer from '{tokenizer}': {e}")

return None

try:

return WrapperAutoTokenizer(tokenizer)

except (OSError, ValueError, ImportError) as e:

LOG.debug(f"Failed to initialize tokenizer from '{tokenizer}': {e}")

return None

🧰 Tools

🪛 Ruff (0.14.13)

209-209: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@nemo_skills/inference/model/base.py` around lines 207 - 211, The try/except in the tokenizer initialization around WrapperAutoTokenizer(tokenizer) is catching a broad Exception; narrow it to the documented failure modes by catching OSError, ValueError and ImportError specifically (instead of Exception) so failures from AutoTokenizer.from_pretrained()/WrapperAutoTokenizer are handled but other unexpected exceptions still surface; update the except clause accordingly in the code that constructs WrapperAutoTokenizer(tokenizer).

nemo_skills/inference/model/code_execution.py

Signed-off-by: i-vainn <[email protected]>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: i-vainn <[email protected]>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Kipok

small question, otherwise looks good. Please run gpt-oss slurm test to double check nothing is broken and code isn't much slower

Kipok · 2026-01-27T22:58:09Z

nemo_skills/inference/model/base.py

-            return WrapperAutoTokenizer(tokenizer)
+            try:
+                return WrapperAutoTokenizer(tokenizer)
+            except (OSError, Exception) as e:


why do we need this? If we were asked to get tokenizer, but it failed, we should just fail completely, no?

This is intentional - see https://github.com/NVIDIA-NeMo/Skills/blob/main/nemo_skills/inference/model/base.py#L177-L195 which tries to initialize the tokenizer from the string, otherwise initialize the tokenizer endpoint.

Without this catch, server model path crashes before reaching the server endpoint fallback.

Signed-off-by: i-vainn <[email protected]>

greptile-apps

_{4 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-29T14:59:12Z

nemo_skills/inference/model/code_execution.py

                if output.count(code_end) + 1 == output.count(code_begin):
                    output += code_end
+                    # Count tokens for the manually added code_end
+                    num_generated_tokens += len(self.model.tokenizer.encode(code_end))


In non-streaming mode, code_end tokens are added to num_generated_tokens which came from output_dict.get("num_generated_tokens", 0). If the server already counted tokens for code_end when it was manually added, this would double-count those tokens.

greptile-apps · 2026-01-29T14:59:13Z

nemo_skills/inference/model/base.py

+            try:
+                return WrapperAutoTokenizer(tokenizer)
+            except OSError:
+                LOG.warning(f"Tokenizer not found at '{tokenizer}', trying fallback to server /tokenize endpoint")
+                return None


Only catching OSError may miss other exceptions during tokenizer initialization (e.g., ImportError, ValueError). If the goal is to gracefully fall back to server endpoint, consider catching broader exceptions.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

tests/slurm-tests/gpt_oss_python_aime25/run_test.py

tests/slurm-tests/run_all.sh

Signed-off-by: i-vainn <[email protected]>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

i-vainn · 2026-01-29T15:16:54Z

Please run gpt-oss slurm test to double check nothing is broken and code isn't much slower

@Kipok, main vs branch complete within the same time

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-30T02:55:27Z

nemo_skills/inference/model/base.py

+            try:
+                return WrapperAutoTokenizer(tokenizer)
+            except OSError:
+                LOG.warning(f"Tokenizer not found at '{tokenizer}', trying fallback to server /tokenize endpoint")
+                return None


Catching only OSError doesn't follow CONTRIBUTING.md guidelines about not being overly defensive. If require_tokenizer is True, the code should fail loudly when tokenizer initialization fails, not silently fall back. The runtime check on line 279 of code_execution.py will catch this later, but it happens during generation (after model setup), which could cause issues in production.

Consider checking the flag here and only catching when fallback is acceptable (when the flag is False).

i-vainn added 3 commits January 22, 2026 15:21

added proper token count to the code execution model

eb7b23a

Signed-off-by: i-vainn <[email protected]>

fall back to tokenize endpoint

f7a35c0

Signed-off-by: i-vainn <[email protected]>

Merge branch 'main' into imoshkov/code-execution-token-count

47ac316

Signed-off-by: i-vainn <[email protected]>

i-vainn requested a review from Kipok January 26, 2026 16:59

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

added early exit on non-positive token count

5161caf

Signed-off-by: i-vainn <[email protected]>

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

Merge branch 'main' into imoshkov/code-execution-token-count

08430d4

Signed-off-by: i-vainn <[email protected]>

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

Kipok reviewed Jan 27, 2026

View reviewed changes

i-vainn added 2 commits January 29, 2026 18:51

Merge branch 'main' into imoshkov/code-execution-token-count

c5eef20

Signed-off-by: i-vainn <[email protected]>

updated tokenizer fallback

180db93

Signed-off-by: i-vainn <[email protected]>

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

fix

5e81ca0

Signed-off-by: i-vainn <[email protected]>

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

Kipok approved these changes Jan 30, 2026

View reviewed changes

Merge branch 'main' into imoshkov/code-execution-token-count

d4cd4ee

Kipok added the run GPU tests label Jan 30, 2026

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

Kipok merged commit 7ded756 into main Jan 30, 2026
6 checks passed

Kipok deleted the imoshkov/code-execution-token-count branch January 30, 2026 04:57

Add proper token counting to code execution model #1184

Add proper token counting to code execution model #1184

Conversation

i-vainn commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

greptile-apps bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

i-vainn Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

i-vainn commented Jan 29, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

i-vainn commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

greptile-apps bot commented Jan 26, 2026 •

edited

Loading