Fix angle-bracket tags hidden by GitHub notebook renderer#204
Fix angle-bracket tags hidden by GitHub notebook renderer#204danielhanchen wants to merge 2 commits intomainfrom
Conversation
Tags like <start_working_out>, <SOLUTION>, <think> are interpreted as HTML by GitHub and silently hidden, making notebook outputs appear broken. - Add text/html with HTML-escaped content to execute_result/display_data outputs containing raw angle-bracket tags (GitHub prefers text/html) - Clear stream outputs that contain raw angle-bracket tags - Replace angle-bracket tags in code comments with safe text - String literals left unchanged (functional code not affected) - Add scripts/fix_html_tags.py for reproducibility
|
Note The number of changes in this pull request is too large for Gemini Code Assist to generate a summary. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f97b0948b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| else: | ||
| new_outputs.append(output) | ||
|
|
||
| cell["outputs"] = new_outputs |
There was a problem hiding this comment.
Restrict outputs field updates to code cells
fix_outputs writes cell["outputs"] = new_outputs for every cell, including markdown/raw cells that do not have an outputs field in notebook format. In any notebook where at least one real output is fixed, this injects "outputs": [] into all non-code cells (visible throughout this commit), which produces invalid notebook structure for tools that validate/round-trip .ipynb files and can cause downstream rewrites or failures in notebook tooling.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9711f5c58f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| source = cell.get("source", []) | ||
| new_source = [] | ||
| cell_changed = False | ||
| for line in source: |
There was a problem hiding this comment.
Handle string-form cell sources when fixing comments
fix_comments iterates source as if it were a list of lines, but notebook source is also valid as a single string; in that case for line in source iterates characters, so none of the multi-character replacements in COMMENT_REPLACEMENTS can ever match. This means comment tags are silently left unfixed for string-serialized cells, which is a real case in this repo’s notebooks and makes reruns of the script incomplete.
Useful? React with 👍 / 👎.
HTML-like tags (<think>, <SOLUTION>, <start_working_out>, etc.) in notebook outputs are silently hidden by GitHub's renderer. This fix: 1. Adds text/html with HTML-escaped content for execute_result and display_data outputs containing angle-bracket tags. 2. Clears stream outputs that contain angle-bracket tags (these are model inference traces that cannot be fixed without escaping). 3. Replaces angle-bracket tags in code comments with safe text. Also includes update_all_notebooks.py fixes from fix/qat-deterministic-cell-ids: - QAT notebooks: dynamic torchao version detection instead of hard-pinned 0.14.0 - No cell ID injection into .ipynb files (eliminates spurious diffs) - Conditional widget state rewrite - Script file permission normalization fix_html_tags.py: only assign cell["outputs"] for cells that already have the key, preventing "outputs": [] from being added to markdown cells. Regenerated all notebooks and python scripts.
9711f5c to
4bcc554
Compare
Summary
HTML-like tags (
<think>,<SOLUTION>,<start_working_out>, etc.) in notebook cell outputs are silently hidden by GitHub's notebook renderer. This makes GRPO/reasoning notebook outputs unreadable on GitHub.Output fixes (680 fixes across 203 files):
execute_result/display_dataoutputs: addstext/htmlwith HTML-escaped content so GitHub renders it properlystreamoutputs containing angle-bracket tags: clears the stream text (these are model inference traces that cannot be preserved without escaping)Comment fixes (46 fixes):
# Acts as <think>) with safe textAlso includes
update_all_notebooks.pyimprovements:torchao==0.14.0.ipynbfiles (eliminates spurious diffs)Bug fix in
fix_html_tags.py:cell["outputs"]for cells that already have the key, preventing"outputs": []from being added to markdown cellsThe large deletions (~160k lines) are legitimate: stream outputs containing
<think>tags in GRPO/reasoning notebooks had to be cleared.Test plan
fix_html_tags.pywith the fix -- 680 output fixes, 46 comment fixes across 203 filesupdate_all_notebooks.py-- regenerated all notebooks and python scriptsupdate_all_notebooks.pya second time -- byte-for-byte identical output (idempotent)"outputs": []on markdown cellstext/htmladdition