test: add failing tests for cost extraction wrong index (#555)#556
Merged
gltanaka merged 2 commits intopromptdriven:mainfrom Feb 24, 2026
Merged
test: add failing tests for cost extraction wrong index (#555)#556gltanaka merged 2 commits intopromptdriven:mainfrom
gltanaka merged 2 commits intopromptdriven:mainfrom
Conversation
5931538 to
e7200ab
Compare
…tdriven#555) The sync loop read result[1] as cost for all operations, but each returns a different tuple format. generate's result[1] is a bool (was_incremental) which passed isinstance(bool, (int, float)) and was counted as $1.00; crash/fix/verify have cost at index 4, not 1, so their costs were always $0.00. Two helpers now dispatch on the operation name to read the correct index. Fixes promptdriven#555 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sts (promptdriven#555) - operation_log.py: use _extract_cost_from_result/_extract_model_from_result helpers instead of hardcoded result[1]/result[2] (same bug as sync_orchestration.py) - Update test files to import and test the actual fixed helpers (29/29 pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e7200ab to
b722e41
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #555 around incorrect cost/model extraction from operation result tuples during sync_orchestration() runs and within the log_operation decorator, and adds regression tests to cover the corrected behavior.
Changes:
- Added shared helpers in
pdd/sync_orchestration.pyto extractcost/modelfrom operation results using operation-specific tuple indices (including excludingboolfrom numeric cost). - Updated
sync_orchestration()and thelog_operationdecorator to use the new helpers rather than hardcoded tuple indices. - Added unit-style and E2E-style tests covering generate/crash/fix/verify (plus regression guards for example/test/test_extend).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pdd/sync_orchestration.py |
Introduces _extract_cost_from_result / _extract_model_from_result and wires them into cost accumulation + log updates. |
pdd/operation_log.py |
Updates the log_operation decorator to use the shared extraction helpers for cost/model logging. |
tests/test_e2e_issue_555_cost_extraction_wrong_index.py |
Adds focused tests validating the helper extraction logic across operation return formats and the bool⊂int edge case. |
tests/test_e2e_issue_555_sync_cost_extraction.py |
Adds end-to-end tests exercising real sync_orchestration() with mocked ops to ensure totals/logging/budget behavior are correct. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds failing tests that detect the bug reported in #555 — sync_orchestration.py reads
result[1]as cost for all operations, but each operation returns a different tuple format.Test Files
tests/test_e2e_issue_555_cost_extraction_wrong_index.py(16 tests — 13 fail, 3 pass)tests/test_e2e_issue_555_sync_cost_extraction.py(13 tests — 10 fail, 3 pass)What This PR Contains
sync_orchestration()function with mocked operationsexample/test/test_extend(which already work) aren't brokenRoot Cause
The sync loop extracts cost via
result[1]and model viaresult[2]for all operations. This is only correct forexampleandtest/test_extend(3-tuple and 4-tuple with cost at index 1). For other operations:[2], model at[3]—result[1]iswas_incremental(bool), which passesisinstance(False, (int, float))sincebool ⊂ int[4], model at[5]—result[1]is a string, so cost defaults to0.0Additionally,
operation_log.py:339-340has the identical hardcoded index bug.Next Steps
_extract_cost_from_result()and_extract_model_from_result()helpersFixes #555
Generated by PDD agentic bug workflow