Skip to content

Replace stdlib json with orjson across all tritonparse modules (#362)#362

Closed
FindHao wants to merge 1 commit intomainfrom
export-D97564498
Closed

Replace stdlib json with orjson across all tritonparse modules (#362)#362
FindHao wants to merge 1 commit intomainfrom
export-D97564498

Conversation

@FindHao
Copy link
Member

@FindHao FindHao commented Mar 20, 2026

Summary:

Replace all stdlib json usage with orjson across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:

  • Add orjson>=3.9 as a dependency in pyproject.toml
  • Add fbsource//third-party/pypi/orjson:orjson to BUCK deps
  • Replace import json with import orjson in 20 Python files
  • Handle orjson.dumps() returning bytes (added .decode() where string output is needed)
  • Map API differences: indent=2OPT_INDENT_2, sort_keys=TrueOPT_SORT_KEYS
  • Replace json.load(f)orjson.loads(f.read()) and json.dump(obj, f)f.write(orjson.dumps(obj))
  • Preserve "import json" in reproducer-generated user code (function_extractor.py)

Reviewed By: xuzhao9

Differential Revision: D97564498

Copilot AI review requested due to automatic review settings March 20, 2026 23:41
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 20, 2026

@FindHao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97564498.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces Python stdlib json usage with orjson across tritonparse to improve JSON parse/serialization performance, updating call sites to account for orjson’s API differences and bytes return type from dumps().

Changes:

  • Introduce orjson>=3.9 as a project dependency and migrate imports/usages across modules.
  • Update JSON read/write patterns (e.g., load/dump equivalents, indentation/sort options, .decode() where string output is required).
  • Keep NDJSON output behavior broadly consistent while moving formatting and tracing code paths to orjson.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tritonparse/validation/schema_loader.py Load packaged JSON schemas using orjson.
tritonparse/validation/json_validator.py Validate NDJSON traces using orjson parsing and error types.
tritonparse/tools/prettify_ndjson.py Parse NDJSON and write prettified JSON via orjson options.
tritonparse/tools/extract_irs.py Replace line-level JSON parsing with orjson and update exception handling.
tritonparse/structured_logging.py Switch structured trace formatting/parsing to orjson.
tritonparse/reproducer/utils.py Read JSON args files using orjson.
tritonparse/reproducer/placeholder_replacer.py Embed JSON blocks using orjson indentation output.
tritonparse/reproducer/multi_file_analyzer.py Write analysis JSON output using orjson indentation.
tritonparse/parse/trace_processor.py Use orjson for parsing/writing NDJSON throughout trace processing.
tritonparse/parse/torch_trace_parser.py Parse embedded JSON metadata using orjson.
tritonparse/parse/sourcemap_utils.py Use orjson for stable serialization in launch hash computation.
tritonparse/parse/event_diff.py Use orjson for value-keying and NDJSON emission in diff generation.
tritonparse/parse/common.py Write parse output mapping JSON using orjson.
tritonparse/info/cli.py Parse CLI list-like inputs using orjson.
tritonparse/diff/output/event_writer.py Emit diff NDJSON using orjson.
tritonparse/diff/core/event_matcher.py Convert event↔string roundtrip using orjson around sourcemap generation.
tritonparse/bisect/state.py Persist/load bisect state via orjson.
tritonparse/ai/parsers.py Parse extracted JSON from text using orjson.
tritonparse/ai/client.py Parse JSON responses/stream events using orjson.
pyproject.toml Add orjson>=3.9 to project dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 52 to 54
# Sort keys for stable serialization
stable_json = json.dumps(stable_event, sort_keys=True, separators=(",", ":"))
stable_json = orjson.dumps(stable_event, option=orjson.OPT_SORT_KEYS).decode()
return hashlib.sha256(stable_json.encode()).hexdigest()[:16]
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_launch_event_hash() is documented as producing a stable hash, but switching from json.dumps(..., sort_keys=True, separators=(",", ":")) to orjson.dumps(..., OPT_SORT_KEYS) changes the canonical bytes for the same logical event (notably Unicode escaping and float rendering), so launch_group_hash values will differ from those produced by prior versions. If these hashes are persisted/consumed across runs or versions, consider preserving the previous canonicalization (e.g., matching stdlib JSON escaping behavior) or explicitly documenting this as a breaking change.

Copilot uses AI. Check for mistakes.
"""Load a JSON schema file from the schemas package."""
ref = pkg_files(_SCHEMAS_PACKAGE).joinpath(filename)
return json.loads(ref.read_text(encoding="utf-8"))
return orjson.loads(ref.read_text(encoding="utf-8"))
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the stated performance goal, consider avoiding read_text() here. orjson.loads() can consume bytes directly, so using ref.read_bytes() (or opening the resource in binary) avoids creating an intermediate Python str and an extra UTF-8 encode step inside orjson.

Suggested change
return orjson.loads(ref.read_text(encoding="utf-8"))
return orjson.loads(ref.read_bytes())

Copilot uses AI. Check for mistakes.
Comment on lines 158 to 160
with open(file_path, "a") as f:
f.write(json.dumps(diff_event) + "\n")
f.write(orjson.dumps(diff_event).decode() + "\n")

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orjson.dumps() will raise on non-finite floats (NaN/Inf). Diff events include tensor_value_diff metrics (e.g., cosine_similarity/rmse) that can legitimately become NaN/Inf, so these writes can fail at runtime. Consider sanitizing the diff_event (recursively converting non-finite floats to strings/null) before dumping, or ensure the analyzer never produces NaN/Inf values in the serialized result.

Copilot uses AI. Check for mistakes.
@meta-codesync meta-codesync bot changed the title Replace stdlib json with orjson across all tritonparse modules Replace stdlib json with orjson across all tritonparse modules (#362) Mar 21, 2026
meta-codesync bot pushed a commit that referenced this pull request Mar 21, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
meta-codesync bot pushed a commit that referenced this pull request Mar 21, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
FindHao added a commit that referenced this pull request Mar 21, 2026
Summary:
Pull Request resolved: #362

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
meta-codesync bot pushed a commit that referenced this pull request Mar 22, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
FindHao added a commit that referenced this pull request Mar 22, 2026
Summary:
Pull Request resolved: #362

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
meta-codesync bot pushed a commit that referenced this pull request Mar 24, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Differential Revision: D97564498
@FindHao
Copy link
Member Author

FindHao commented Mar 24, 2026

@copilot review again

Copy link

Copilot AI commented Mar 24, 2026

@FindHao I've opened a new pull request, #364, to work on those changes. Once the pull request is ready, I'll request review from you.

meta-codesync bot pushed a commit that referenced this pull request Mar 24, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Reviewed By: xuzhao9

Differential Revision: D97564498
meta-codesync bot pushed a commit that referenced this pull request Mar 24, 2026
Summary:

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Reviewed By: xuzhao9

Differential Revision: D97564498
Summary:
Pull Request resolved: #362

Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization.

Resolves: #266

Changes:
- Add `orjson>=3.9` as a dependency in `pyproject.toml`
- Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps
- Replace `import json` with `import orjson` in 20 Python files
- Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed)
- Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS`
- Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))`
- Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`)

Reviewed By: xuzhao9

Differential Revision: D97564498
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 24, 2026

This pull request has been merged in 1c5c71e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use orjson to replace json

3 participants