feat: add Conflux proxy capture dataset loader for verbatim session r…#772
feat: add Conflux proxy capture dataset loader for verbatim session r…#772ajcasagrande wants to merge 4 commits intomainfrom
Conversation
…eplay Introduces ConfluxLoader for replaying Claude Code and Codex sessions captured by Conflux proxy. Supports agent_id grouping, timestamp-based delays for fixed-schedule replay, per-turn extra_params/hyperparameters, and optional utility call inclusion. Refactors ChatEndpoint payload construction to apply extra_params before endpoint-level overrides. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fb02230d3af266abd8252f5257192a9e8bd294c2Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@fb02230d3af266abd8252f5257192a9e8bd294c2Last updated for commit: |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
WalkthroughAdds full Conflux proxy-capture support (loader, models, plugin), introduces fixed-schedule speedup and Conflux CLI flags, propagates per-turn extra params and token metadata into payloads, updates fixed-schedule timing scaling, and adds extensive unit and integration tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/aiperf/dataset/loader/conflux.py`:
- Around line 183-195: The Turn.max_tokens is being derived only from observed
output and _extract_extra_params() strips original hyperparameter caps; update
the loader to preserve captured limits by first checking the record
hyperparameters for keys ("max_completion_tokens", "max_output_tokens",
"max_tokens") and using that value if present, otherwise fall back to the
existing tokens-based total_output logic; implement a helper like
_extract_max_tokens(record: ConfluxRecord) and call it when constructing Turn
(instead of only using record.tokens), and ensure _build_conversation() uses the
preserved Turn.max_tokens rather than replacing it with observed output.
- Around line 55-57: The current probe (used by can_load and _probe_file) only
reads up to _PROBE_BYTES and rejects files if the first array element is
truncated; update cls._probe_file to read progressively (starting with
_PROBE_BYTES) and continue reading additional chunks until the first JSON
object/array element is fully closed (track nesting and string/escape state to
detect a proper close) or until a larger hard cap (e.g., 1MB) is reached, then
pass that complete prefix to the existing parsing check; ensure references to
_PROBE_BYTES remain as the initial chunk size and that can_load uses the revised
_probe_file logic so valid Conflux captures with large first records are
accepted.
In `@tests/unit/endpoints/test_openai_chat_completions.py`:
- Around line 478-493: ChatEndpoint.format_payload is mutating the nested
stream_options dict taken from Turn.extra_params / EndpointInfo.extra, leaking
state; update format_payload to perform a shallow copy of the stream_options
dict (e.g., new_stream_opts = dict(orig_stream_opts) or use copy()) before
adding include_usage so the original Turn.extra_params/EndpointInfo.extra is not
modified, and update the test to assert that turn.extra_params (or its
stream_options entry) remains unchanged after calling format_payload. Ensure
references: ChatEndpoint.format_payload, Turn.extra_params, EndpointInfo.extra,
and payload["stream_options"] are the targets to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8b9b2299-287f-44d3-8980-aafa9c16da3e
📒 Files selected for processing (11)
docs/cli-options.mdsrc/aiperf/common/config/input_config.pysrc/aiperf/common/models/dataset_models.pysrc/aiperf/dataset/loader/__init__.pysrc/aiperf/dataset/loader/conflux.pysrc/aiperf/dataset/loader/models.pysrc/aiperf/endpoints/openai_chat.pysrc/aiperf/plugin/enums.pysrc/aiperf/plugin/plugins.yamltests/unit/dataset/loader/test_conflux.pytests/unit/endpoints/test_openai_chat_completions.py
Allow scaling fixed schedule timestamps to replay traces faster or slower (e.g. 2.0 = twice as fast). Includes validation against conflicting --synthesis-speedup-ratio and bumps Conflux probe buffer to 1MB for reliable format detection. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
lvogel04
left a comment
There was a problem hiding this comment.
lgtm great work, one thing we need to address is conflux unified exports are pretty printed JSON, at least the ones I have exported from the CLI, so without the --custom-dataset-type conflux flag you will hit an error because the first element parsed is [.
This error came from testing my export without the flag:
Error: JSONDecodeError Reason: JSONDecodeError('unexpected end of data: line 1 column 2 (char 1)')
With the flag it works great. Can we add support for this and test coverage?
I would also like to test this more once we have the conflux export supported which I believe Ben is cooking up.
… stream_options The Conflux probe now loads and validates the full JSON array instead of parsing a truncated prefix, supporting both pretty-printed and compact formats. The composer falls through to filename-only detection when the first line is not valid JSON (e.g. "[" in indented exports) or parses as a non-dict (e.g. compact arrays). ChatEndpoint.format_payload no longer mutates nested stream_options from Turn.extra_params across retries. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/unit/dataset/loader/test_conflux.py (2)
78-78: Consider using normal construction instead ofmodel_construct.Using
model_construct()bypasses Pydantic validation, which may hide issues ifInputConfigfield requirements change. For test fixtures, consider either providing all required fields or usingmodel_validate()with a minimal but complete dict.Alternative approach
- input=InputConfig.model_construct( - conflux_include_utility_calls=include_utility, - ), + input=InputConfig( + conflux_include_utility_calls=include_utility, + ),This ensures validation runs and catches any missing required fields.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/dataset/loader/test_conflux.py` at line 78, Replace the bypassed Pydantic construction call InputConfig.model_construct(...) in the test with validated construction: either call the InputConfig(...) constructor with all required fields populated, or call InputConfig.model_validate({...}) with a minimal but complete dict so Pydantic validation runs; update the test fixture that references InputConfig.model_construct to use InputConfig(...) or InputConfig.model_validate and ensure all required fields for InputConfig are provided.
692-799: Consider parametrizing repetitive test cases.The
TestExtractExtraParamsclass contains many similar test methods that differ primarily in input data and expected output. Parametrization could reduce duplication while maintaining clarity.Example parametrization approach
`@pytest.mark.parametrize`( "hyperparameters,expected", [ (None, None), ({}, None), ({"temperature": 0.7, "top_p": 0.9}, {"temperature": 0.7, "top_p": 0.9}), ({"temperature": 0.5, "max_tokens": 1000}, {"temperature": 0.5}), ({"temperature": 0.5, "max_output_tokens": 2000}, {"temperature": 0.5}), ({"temperature": 0.7, "top_k": None, "stop": None}, {"temperature": 0.7}), ({"max_tokens": 100, "max_output_tokens": 200}, None), ({"temperature": None, "top_p": None}, None), ({"temperature": 0, "frequency_penalty": 0.0}, {"temperature": 0, "frequency_penalty": 0.0}), ({"logprobs": False}, {"logprobs": False}), ({"stop": ""}, {"stop": ""}), ({"response_format": {"type": "json_object"}}, {"response_format": {"type": "json_object"}}), ], ids=[ "no_hyperparameters", "empty_hyperparameters", "basic_params", "max_tokens_filtered", "max_output_tokens_filtered", "none_values_filtered", "all_filtered", "all_none", "zero_preserved", "false_preserved", "empty_string_preserved", "nested_dict_preserved", ], ) def test_extract_extra_params_scenarios(self, hyperparameters, expected): record = ConfluxRecord( session_id="s1", timestamp=1000.0, hyperparameters=hyperparameters, ) assert ConfluxLoader._extract_extra_params(record) == expectedHowever, the current explicit test structure is also clear and may be preferred for readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/dataset/loader/test_conflux.py` around lines 692 - 799, The tests in TestExtractExtraParams are highly repetitive; refactor by parametrizing the cases for ConfluxLoader._extract_extra_params: replace the many individual test_* methods with a single pytest.mark.parametrize test that iterates over tuples of (hyperparameters, expected) covering the same scenarios (None, {}, basic params, max_tokens filtered, max_output_tokens filtered, none-values filtered, all filtered, all none, zero preserved, false preserved, empty string preserved, nested dict preserved) and assert ConfluxLoader._extract_extra_params(record) == expected for a ConfluxRecord initialized with session_id="s1" and timestamp=1000.0; keep the same case ids for readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/dataset/loader/test_conflux.py`:
- Line 481: The test assertion uses pytest.raises(..., match="No .json files
found") where the "." is a regex metacharacter; update the match argument to
precisely escape the period (e.g., use a raw string or escape the dot) so the
pattern becomes "No \.json files found" (e.g., r"No \.json files found") in the
pytest.raises call to avoid unintended regex matching and static-analysis
warnings.
---
Nitpick comments:
In `@tests/unit/dataset/loader/test_conflux.py`:
- Line 78: Replace the bypassed Pydantic construction call
InputConfig.model_construct(...) in the test with validated construction: either
call the InputConfig(...) constructor with all required fields populated, or
call InputConfig.model_validate({...}) with a minimal but complete dict so
Pydantic validation runs; update the test fixture that references
InputConfig.model_construct to use InputConfig(...) or
InputConfig.model_validate and ensure all required fields for InputConfig are
provided.
- Around line 692-799: The tests in TestExtractExtraParams are highly
repetitive; refactor by parametrizing the cases for
ConfluxLoader._extract_extra_params: replace the many individual test_* methods
with a single pytest.mark.parametrize test that iterates over tuples of
(hyperparameters, expected) covering the same scenarios (None, {}, basic params,
max_tokens filtered, max_output_tokens filtered, none-values filtered, all
filtered, all none, zero preserved, false preserved, empty string preserved,
nested dict preserved) and assert ConfluxLoader._extract_extra_params(record) ==
expected for a ConfluxRecord initialized with session_id="s1" and
timestamp=1000.0; keep the same case ids for readability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9dab99ba-0c20-47b3-9b3d-62f14bf7f742
📒 Files selected for processing (11)
docs/cli-options.mdsrc/aiperf/common/config/input_config.pysrc/aiperf/dataset/composer/custom.pysrc/aiperf/dataset/loader/conflux.pysrc/aiperf/endpoints/openai_chat.pysrc/aiperf/timing/config.pysrc/aiperf/timing/strategies/fixed_schedule.pytests/integration/test_conflux_loader.pytests/unit/dataset/composer/test_custom_composer.pytests/unit/dataset/loader/test_conflux.pytests/unit/endpoints/test_openai_chat_completions.py
✅ Files skipped from review due to trivial changes (1)
- tests/unit/endpoints/test_openai_chat_completions.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/aiperf/endpoints/openai_chat.py
…eplay
Introduces ConfluxLoader for replaying Claude Code and Codex sessions captured by Conflux proxy. Supports agent_id grouping, timestamp-based delays for fixed-schedule replay, per-turn extra_params/hyperparameters, and optional utility call inclusion. Refactors ChatEndpoint payload construction to apply extra_params before endpoint-level overrides.
Summary by CodeRabbit
New Features
Documentation
Tests