Skip to content

test: add failing tests for cost extraction wrong index (#555)#556

Merged
gltanaka merged 2 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-555
Feb 24, 2026
Merged

test: add failing tests for cost extraction wrong index (#555)#556
gltanaka merged 2 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-555

Conversation

@Serhan-Asad
Copy link
Collaborator

Summary

Adds failing tests that detect the bug reported in #555 — sync_orchestration.py reads result[1] as cost for all operations, but each operation returns a different tuple format.

Test Files

  • Unit tests: tests/test_e2e_issue_555_cost_extraction_wrong_index.py (16 tests — 13 fail, 3 pass)
  • E2E tests: tests/test_e2e_issue_555_sync_cost_extraction.py (13 tests — 10 fail, 3 pass)

What This PR Contains

  • Failing unit tests that reproduce the reported bug for each affected operation
  • Failing E2E tests that call the real sync_orchestration() function with mocked operations
  • Regression guards ensuring example/test/test_extend (which already work) aren't broken
  • Tests are verified to fail on current code and will pass once the bug is fixed

Root Cause

The sync loop extracts cost via result[1] and model via result[2] for all operations. This is only correct for example and test/test_extend (3-tuple and 4-tuple with cost at index 1). For other operations:

  • generate (4-tuple): cost at [2], model at [3]result[1] is was_incremental (bool), which passes isinstance(False, (int, float)) since bool ⊂ int
  • crash/fix/verify (6-tuple): cost at [4], model at [5]result[1] is a string, so cost defaults to 0.0

Additionally, operation_log.py:339-340 has the identical hardcoded index bug.

Next Steps

  1. Implement the fix — create _extract_cost_from_result() and _extract_model_from_result() helpers
  2. Verify all 29 tests pass
  3. Run full test suite for regressions
  4. Mark PR as ready for review

Fixes #555


Generated by PDD agentic bug workflow

Serhan-Asad and others added 2 commits February 23, 2026 14:22
…tdriven#555)

The sync loop read result[1] as cost for all operations, but each
returns a different tuple format. generate's result[1] is a bool
(was_incremental) which passed isinstance(bool, (int, float)) and
was counted as $1.00; crash/fix/verify have cost at index 4, not 1,
so their costs were always $0.00. Two helpers now dispatch on the
operation name to read the correct index.

Fixes promptdriven#555

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sts (promptdriven#555)

- operation_log.py: use _extract_cost_from_result/_extract_model_from_result helpers
  instead of hardcoded result[1]/result[2] (same bug as sync_orchestration.py)
- Update test files to import and test the actual fixed helpers (29/29 pass)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #555 around incorrect cost/model extraction from operation result tuples during sync_orchestration() runs and within the log_operation decorator, and adds regression tests to cover the corrected behavior.

Changes:

  • Added shared helpers in pdd/sync_orchestration.py to extract cost/model from operation results using operation-specific tuple indices (including excluding bool from numeric cost).
  • Updated sync_orchestration() and the log_operation decorator to use the new helpers rather than hardcoded tuple indices.
  • Added unit-style and E2E-style tests covering generate/crash/fix/verify (plus regression guards for example/test/test_extend).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pdd/sync_orchestration.py Introduces _extract_cost_from_result / _extract_model_from_result and wires them into cost accumulation + log updates.
pdd/operation_log.py Updates the log_operation decorator to use the shared extraction helpers for cost/model logging.
tests/test_e2e_issue_555_cost_extraction_wrong_index.py Adds focused tests validating the helper extraction logic across operation return formats and the bool⊂int edge case.
tests/test_e2e_issue_555_sync_cost_extraction.py Adds end-to-end tests exercising real sync_orchestration() with mocked ops to ensure totals/logging/budget behavior are correct.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gltanaka gltanaka merged commit 3e85268 into promptdriven:main Feb 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync_orchestration.py: cost extraction reads wrong tuple index for generate/crash/fix/verify operations

3 participants