Skip to content

Comments

fix: CompletionsDataset mask_prompt passes wrong type to apply_chat_template#2

Open
iamadalek wants to merge 1 commit intomainfrom
fix/completions-mask-prompt-type-error
Open

fix: CompletionsDataset mask_prompt passes wrong type to apply_chat_template#2
iamadalek wants to merge 1 commit intomainfrom
fix/completions-mask-prompt-type-error

Conversation

@iamadalek
Copy link
Owner

Summary

  • Fix CompletionsDataset.process() passing a bare dict (messages[0]) instead of a list (messages[:1]) to tokenizer.apply_chat_template() when mask_prompt=True
  • Add test coverage for mask_prompt=True and mask_prompt=False paths on both CompletionsDataset and ChatDataset

Closes #1

Test plan

  • test_completions_mask_prompt -- verifies mask_prompt=True produces offset > 0 and mask_prompt=False produces offset == 0
  • test_chat_mask_prompt -- regression guard for ChatDataset (already correct, now explicitly tested)
  • Full test suite: 159 tests passing, 0 failures

Verification Evidence

  • Tests: 159 passing (0 failures, 1 skipped)
  • Acceptance criteria: single-line fix on datasets.py:119, tests cover both dataset types with both mask_prompt values

🤖 Generated with Claude Code

CompletionsDataset.process() passed messages[0] (a bare dict) to
tokenizer.apply_chat_template() when mask_prompt=True, but the method
expects a List[Dict]. Changed to messages[:1] (a list slice), matching
the pattern already used by ChatDataset.process().

Added test coverage for mask_prompt=True and mask_prompt=False on both
CompletionsDataset and ChatDataset.

Closes #1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: CompletionsDataset.process crashes when mask_prompt is enabled

1 participant