Replace stdlib json with orjson across all tritonparse modules (#362)#362
Replace stdlib json with orjson across all tritonparse modules (#362)#362
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces Python stdlib json usage with orjson across tritonparse to improve JSON parse/serialization performance, updating call sites to account for orjson’s API differences and bytes return type from dumps().
Changes:
- Introduce
orjson>=3.9as a project dependency and migrate imports/usages across modules. - Update JSON read/write patterns (e.g.,
load/dumpequivalents, indentation/sort options,.decode()where string output is required). - Keep NDJSON output behavior broadly consistent while moving formatting and tracing code paths to
orjson.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tritonparse/validation/schema_loader.py | Load packaged JSON schemas using orjson. |
| tritonparse/validation/json_validator.py | Validate NDJSON traces using orjson parsing and error types. |
| tritonparse/tools/prettify_ndjson.py | Parse NDJSON and write prettified JSON via orjson options. |
| tritonparse/tools/extract_irs.py | Replace line-level JSON parsing with orjson and update exception handling. |
| tritonparse/structured_logging.py | Switch structured trace formatting/parsing to orjson. |
| tritonparse/reproducer/utils.py | Read JSON args files using orjson. |
| tritonparse/reproducer/placeholder_replacer.py | Embed JSON blocks using orjson indentation output. |
| tritonparse/reproducer/multi_file_analyzer.py | Write analysis JSON output using orjson indentation. |
| tritonparse/parse/trace_processor.py | Use orjson for parsing/writing NDJSON throughout trace processing. |
| tritonparse/parse/torch_trace_parser.py | Parse embedded JSON metadata using orjson. |
| tritonparse/parse/sourcemap_utils.py | Use orjson for stable serialization in launch hash computation. |
| tritonparse/parse/event_diff.py | Use orjson for value-keying and NDJSON emission in diff generation. |
| tritonparse/parse/common.py | Write parse output mapping JSON using orjson. |
| tritonparse/info/cli.py | Parse CLI list-like inputs using orjson. |
| tritonparse/diff/output/event_writer.py | Emit diff NDJSON using orjson. |
| tritonparse/diff/core/event_matcher.py | Convert event↔string roundtrip using orjson around sourcemap generation. |
| tritonparse/bisect/state.py | Persist/load bisect state via orjson. |
| tritonparse/ai/parsers.py | Parse extracted JSON from text using orjson. |
| tritonparse/ai/client.py | Parse JSON responses/stream events using orjson. |
| pyproject.toml | Add orjson>=3.9 to project dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Sort keys for stable serialization | ||
| stable_json = json.dumps(stable_event, sort_keys=True, separators=(",", ":")) | ||
| stable_json = orjson.dumps(stable_event, option=orjson.OPT_SORT_KEYS).decode() | ||
| return hashlib.sha256(stable_json.encode()).hexdigest()[:16] |
There was a problem hiding this comment.
compute_launch_event_hash() is documented as producing a stable hash, but switching from json.dumps(..., sort_keys=True, separators=(",", ":")) to orjson.dumps(..., OPT_SORT_KEYS) changes the canonical bytes for the same logical event (notably Unicode escaping and float rendering), so launch_group_hash values will differ from those produced by prior versions. If these hashes are persisted/consumed across runs or versions, consider preserving the previous canonicalization (e.g., matching stdlib JSON escaping behavior) or explicitly documenting this as a breaking change.
| """Load a JSON schema file from the schemas package.""" | ||
| ref = pkg_files(_SCHEMAS_PACKAGE).joinpath(filename) | ||
| return json.loads(ref.read_text(encoding="utf-8")) | ||
| return orjson.loads(ref.read_text(encoding="utf-8")) |
There was a problem hiding this comment.
For the stated performance goal, consider avoiding read_text() here. orjson.loads() can consume bytes directly, so using ref.read_bytes() (or opening the resource in binary) avoids creating an intermediate Python str and an extra UTF-8 encode step inside orjson.
| return orjson.loads(ref.read_text(encoding="utf-8")) | |
| return orjson.loads(ref.read_bytes()) |
| with open(file_path, "a") as f: | ||
| f.write(json.dumps(diff_event) + "\n") | ||
| f.write(orjson.dumps(diff_event).decode() + "\n") | ||
|
|
There was a problem hiding this comment.
orjson.dumps() will raise on non-finite floats (NaN/Inf). Diff events include tensor_value_diff metrics (e.g., cosine_similarity/rmse) that can legitimately become NaN/Inf, so these writes can fail at runtime. Consider sanitizing the diff_event (recursively converting non-finite floats to strings/null) before dumping, or ensure the analyzer never produces NaN/Inf values in the serialized result.
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
399c3ad to
271ace5
Compare
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
271ace5 to
395a11f
Compare
Summary: Pull Request resolved: #362 Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
395a11f to
50da3f2
Compare
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
50da3f2 to
0a3d171
Compare
Summary: Pull Request resolved: #362 Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
0a3d171 to
f9d1131
Compare
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Differential Revision: D97564498
f9d1131 to
393d67c
Compare
|
@copilot review again |
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Reviewed By: xuzhao9 Differential Revision: D97564498
393d67c to
863704d
Compare
Summary: Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Reviewed By: xuzhao9 Differential Revision: D97564498
863704d to
e8c4ea9
Compare
Summary: Pull Request resolved: #362 Replace all stdlib `json` usage with `orjson` across tritonparse for faster JSON serialization/deserialization. Resolves: #266 Changes: - Add `orjson>=3.9` as a dependency in `pyproject.toml` - Add `fbsource//third-party/pypi/orjson:orjson` to BUCK deps - Replace `import json` with `import orjson` in 20 Python files - Handle `orjson.dumps()` returning `bytes` (added `.decode()` where string output is needed) - Map API differences: `indent=2` → `OPT_INDENT_2`, `sort_keys=True` → `OPT_SORT_KEYS` - Replace `json.load(f)` → `orjson.loads(f.read())` and `json.dump(obj, f)` → `f.write(orjson.dumps(obj))` - Preserve `"import json"` in reproducer-generated user code (`function_extractor.py`) Reviewed By: xuzhao9 Differential Revision: D97564498
e8c4ea9 to
fb41266
Compare
|
This pull request has been merged in 1c5c71e. |
Summary:
Replace all stdlib
jsonusage withorjsonacross tritonparse for faster JSON serialization/deserialization.Resolves: #266
Changes:
orjson>=3.9as a dependency inpyproject.tomlfbsource//third-party/pypi/orjson:orjsonto BUCK depsimport jsonwithimport orjsonin 20 Python filesorjson.dumps()returningbytes(added.decode()where string output is needed)indent=2→OPT_INDENT_2,sort_keys=True→OPT_SORT_KEYSjson.load(f)→orjson.loads(f.read())andjson.dump(obj, f)→f.write(orjson.dumps(obj))"import json"in reproducer-generated user code (function_extractor.py)Reviewed By: xuzhao9
Differential Revision: D97564498