Skip to content

Commit 3bc0222

Browse files
authored
Merge pull request #120 from jiaminc-cmu/create-new-test-instead-of-overwrite
Resolves #114: Enhance test file handling in `pdd bug`, `pdd test`, and `pdd fix`
2 parents 7ea2d7c + 4cef082 commit 3bc0222

File tree

10 files changed

+418
-76
lines changed

10 files changed

+418
-76
lines changed

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1314,10 +1314,10 @@ Arguments:
13141314
- `CODE_FILE`: The filename of the code file to be tested.
13151315
13161316
Options:
1317-
- `--output LOCATION`: Specify where to save the generated test file. The default file name is `test_<basename>.<language_file_extension>`.
1317+
- `--output LOCATION`: Specify where to save the generated test file. The default file name is `test_<basename>.<language_file_extension>`. If an output file with the specified name already exists, a new file with a numbered suffix (e.g., `test_calculator_1.py`) will be created instead of overwriting.
13181318
- `--language`: Specify the programming language. Defaults to the language specified by the prompt file name.
13191319
- `--coverage-report PATH`: Path to the coverage report file for existing tests. When provided, generates additional tests to improve coverage.
1320-
- `--existing-tests PATH`: Path to the existing unit test file. Required when using --coverage-report.
1320+
- `--existing-tests PATH [PATH...]`: Path(s) to the existing unit test file(s). Required when using --coverage-report. Multiple paths can be provided.
13211321
- `--target-coverage FLOAT`: Desired code coverage percentage to achieve (default is 90.0).
13221322
- `--merge`: When used with --existing-tests, merges new tests with existing test file instead of creating a separate file.
13231323
@@ -1346,9 +1346,9 @@ could influence the output of the `pdd test` command when run in the same direct
13461346
pdd [GLOBAL OPTIONS] test --output tests/test_factorial_calculator.py factorial_calculator_python.prompt src/factorial_calculator.py
13471347
```
13481348
1349-
2. Generate additional tests to improve coverage:
1349+
2. Generate additional tests to improve coverage (with multiple existing test files):
13501350
```
1351-
pdd [GLOBAL OPTIONS] test --coverage-report coverage.xml --existing-tests tests/test_calculator.py --output tests/test_calculator_enhanced.py calculator_python.prompt src/calculator.py
1351+
pdd [GLOBAL OPTIONS] test --coverage-report coverage.xml --existing-tests tests/test_calculator.py --existing-tests tests/test_calculator_edge_cases.py --output tests/test_calculator_enhanced.py calculator_python.prompt src/calculator.py
13521352
```
13531353
13541354
3. Improve coverage and merge with existing tests:
@@ -1465,11 +1465,11 @@ pdd [GLOBAL OPTIONS] fix [OPTIONS] PROMPT_FILE CODE_FILE UNIT_TEST_FILE ERROR_FI
14651465
Arguments:
14661466
- `PROMPT_FILE`: The filename of the prompt file that generated the code under test.
14671467
- `CODE_FILE`: The filename of the code file to be fixed.
1468-
- `UNIT_TEST_FILE`: The filename of the unit test file.
1468+
- `UNIT_TEST_FILES`: The filename(s) of the unit test file(s). Multiple files can be provided, and each will be processed individually.
14691469
- `ERROR_FILE`: The filename containing the unit test runtime error messages. Optional and does not need to exist when used with the `--loop` command.
14701470
14711471
Options:
1472-
- `--output-test LOCATION`: Specify where to save the fixed unit test file. The default file name is `test_<basename>_fixed.<language_file_extension>`. If an environment variable `PDD_FIX_TEST_OUTPUT_PATH` is set, the file will be saved in that path unless overridden by this option.
1472+
- `--output-test LOCATION`: Specify where to save the fixed unit test file. The default file name is `test_<basename>_fixed.<language_file_extension>`. **Warning: If multiple `UNIT_TEST_FILES` are provided along with this option, only the fixed content of the last processed test file will be saved to this location, overwriting previous results. For individual fixed files, omit this option.**
14731473
- `--output-code LOCATION`: Specify where to save the fixed code file. The default file name is `<basename>_fixed.<language_file_extension>`. If an environment variable `PDD_FIX_CODE_OUTPUT_PATH` is set, the file will be saved in that path unless overridden by this option.
14741474
- `--output-results LOCATION`: Specify where to save the results of the error fixing process. The default file name is `<basename>_fix_results.log`. If an environment variable `PDD_FIX_RESULTS_OUTPUT_PATH` is set, the file will be saved in that path unless overridden by this option.
14751475
- `--loop`: Enable iterative fixing process.
@@ -1481,8 +1481,8 @@ Options:
14811481
When the `--loop` option is used, the fix command will attempt to fix errors through multiple iterations. It will use the specified verification program to check if the code runs correctly after each fix attempt. The process will continue until either the errors are fixed, the maximum number of attempts is reached, or the budget is exhausted.
14821482
14831483
Outputs:
1484-
- Fixed unit test file
1485-
- Fixed code file
1484+
- Fixed unit test file(s).
1485+
- Fixed code file.
14861486
- Results file containing the LLM model's output with unit test results.
14871487
- Print out of results when using '--loop' containing:
14881488
- Success status (boolean)
@@ -1492,9 +1492,9 @@ Outputs:
14921492
14931493
Example:
14941494
```
1495-
pdd [GLOBAL OPTIONS] fix --output-test tests/test_factorial_calculator_fixed.py --output-code src/factorial_calculator_fixed.py --output-results results/factorial_fix_results.log factorial_calculator_python.prompt src/factorial_calculator.py tests/test_factorial_calculator.py errors.log
1495+
pdd [GLOBAL OPTIONS] fix --output-code src/factorial_calculator_fixed.py --output-results results/factorial_fix_results.log factorial_calculator_python.prompt src/factorial_calculator.py tests/test_factorial_calculator.py tests/test_factorial_calculator_edge_cases.py errors.log
14961496
```
1497-
In this example, `factorial_calculator_python.prompt` is the prompt file that originally generated the code under test.
1497+
In this example, `pdd fix` will be run for each test file, and the fixed test files will be saved as `tests/test_factorial_calculator_fixed.py` and `tests/test_factorial_calculator_edge_cases_fixed.py`.
14981498
14991499
15001500
#### Agentic Fallback Mode
@@ -1784,7 +1784,7 @@ Arguments:
17841784
- `DESIRED_OUTPUT_FILE`: File containing the desired (correct) output of the program.
17851785
17861786
Options:
1787-
- `--output LOCATION`: Specify where to save the generated unit test. The default file name is `test_<basename>_bug.<language_extension>`.
1787+
- `--output LOCATION`: Specify where to save the generated unit test. The default file name is `test_<basename>_bug.<language_extension>`. If an output file with the specified name already exists, a new file with a numbered suffix (e.g., `test_calculator_bug_1.py`) will be created instead of overwriting.
17881788
- `--language`: Specify the programming language for the unit test (default is "Python").
17891789
17901790
Example:

pdd/cmd_test_main.py

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def cmd_test_main(
2121
output: str | None,
2222
language: str | None,
2323
coverage_report: str | None,
24-
existing_tests: str | None,
24+
existing_tests: list[str] | None,
2525
target_coverage: float | None,
2626
merge: bool | None,
2727
strength: float | None = None,
@@ -40,7 +40,7 @@ def cmd_test_main(
4040
output (str | None): Path to save the generated test file.
4141
language (str | None): Programming language.
4242
coverage_report (str | None): Path to the coverage report file.
43-
existing_tests (str | None): Path to the existing unit test file.
43+
existing_tests (list[str] | None): Paths to the existing unit test files.
4444
target_coverage (float | None): Desired code coverage percentage.
4545
merge (bool | None): Whether to merge new tests with existing tests.
4646
@@ -76,7 +76,7 @@ def cmd_test_main(
7676
if coverage_report:
7777
input_file_paths["coverage_report"] = coverage_report
7878
if existing_tests:
79-
input_file_paths["existing_tests"] = existing_tests
79+
input_file_paths["existing_tests"] = existing_tests[0]
8080

8181
command_options = {
8282
"output": output,
@@ -94,6 +94,15 @@ def cmd_test_main(
9494
context_override=ctx.obj.get('context'),
9595
confirm_callback=ctx.obj.get('confirm_callback')
9696
)
97+
98+
# Read multiple existing test files and concatenate their content
99+
if existing_tests:
100+
existing_tests_content = ""
101+
for test_file in existing_tests:
102+
with open(test_file, 'r') as f:
103+
existing_tests_content += f.read() + "\n"
104+
input_strings["existing_tests"] = existing_tests_content
105+
97106
# Use centralized config resolution with proper priority:
98107
# CLI > pddrc > defaults
99108
effective_config = resolve_effective_config(
@@ -119,20 +128,11 @@ def cmd_test_main(
119128
print(f"[bold blue]Language detected:[/bold blue] {language}")
120129

121130
# Determine where the generated tests will be written so we can share it with the LLM
131+
# Always use resolved_output since construct_paths handles numbering for test/bug commands
122132
resolved_output = output_file_paths["output"]
123-
if output is None:
124-
output_file = resolved_output
125-
else:
126-
try:
127-
is_dir_hint = output.endswith('/')
128-
except Exception:
129-
is_dir_hint = False
130-
if is_dir_hint or (Path(output).exists() and Path(output).is_dir()):
131-
output_file = resolved_output
132-
else:
133-
output_file = output
133+
output_file = resolved_output
134134
if merge and existing_tests:
135-
output_file = existing_tests
135+
output_file = existing_tests[0]
136136

137137
if not output_file:
138138
print("[bold red]Error: Output file path could not be determined.[/bold red]")
@@ -193,6 +193,17 @@ def cmd_test_main(
193193
# Return error result instead of ctx.exit(1) to allow orchestrator to handle gracefully
194194
return "", 0.0, f"Error: {exception}"
195195

196+
# Handle output - always use resolved file path since construct_paths handles numbering
197+
resolved_output = output_file_paths["output"]
198+
output_file = resolved_output
199+
if merge and existing_tests:
200+
output_file = existing_tests[0] if existing_tests else None
201+
202+
if not output_file:
203+
print("[bold red]Error: Output file path could not be determined.[/bold red]")
204+
ctx.exit(1)
205+
return "", 0.0, ""
206+
196207
# Check if unit_test content is empty
197208
if not unit_test or not unit_test.strip():
198209
print(f"[bold red]Error: Generated unit test content is empty or whitespace-only.[/bold red]")

pdd/commands/fix.py

Lines changed: 49 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Fix command.
33
"""
44
import click
5-
from typing import Dict, Optional, Tuple, Any
5+
from typing import Dict, List, Optional, Tuple, Any
66

77
from ..fix_main import fix_main
88
from ..track_cost import track_cost
@@ -11,7 +11,7 @@
1111
@click.command("fix")
1212
@click.argument("prompt_file", type=click.Path(exists=True, dir_okay=False))
1313
@click.argument("code_file", type=click.Path(exists=True, dir_okay=False))
14-
@click.argument("unit_test_file", type=click.Path(exists=True, dir_okay=False))
14+
@click.argument("unit_test_files", nargs=-1, type=click.Path(exists=True, dir_okay=False))
1515
@click.argument("error_file", type=click.Path(dir_okay=False)) # Allow non-existent for loop mode
1616
@click.option(
1717
"--output-test",
@@ -32,9 +32,9 @@
3232
help="Specify where to save the results log (file or directory).",
3333
)
3434
@click.option(
35-
"--loop",
36-
is_flag=True,
37-
default=False,
35+
"--loop",
36+
is_flag=True,
37+
default=False,
3838
help="Enable iterative fixing process."
3939
)
4040
@click.option(
@@ -63,7 +63,7 @@
6363
default=False,
6464
help="Automatically submit the example if all unit tests pass.",
6565
)
66-
@click.option(
66+
@click.option(
6767
"--agentic-fallback/--no-agentic-fallback",
6868
is_flag=True,
6969
default=True,
@@ -75,7 +75,7 @@ def fix(
7575
ctx: click.Context,
7676
prompt_file: str,
7777
code_file: str,
78-
unit_test_file: str,
78+
unit_test_files: Tuple[str, ...],
7979
error_file: str,
8080
output_test: Optional[str],
8181
output_code: Optional[str],
@@ -87,30 +87,50 @@ def fix(
8787
auto_submit: bool,
8888
agentic_fallback: bool,
8989
) -> Optional[Tuple[Dict[str, Any], float, str]]:
90-
"""Fix code based on a prompt and unit test errors."""
90+
"""Fix code based on a prompt and unit test errors.
91+
92+
Accepts one or more UNIT_TEST_FILES. Each test file is processed separately,
93+
allowing the AI to run and fix tests individually rather than as a concatenated blob.
94+
"""
9195
try:
92-
# The actual logic is in fix_main
93-
success, fixed_unit_test, fixed_code, attempts, total_cost, model_name = fix_main(
94-
ctx=ctx,
95-
prompt_file=prompt_file,
96-
code_file=code_file,
97-
unit_test_file=unit_test_file,
98-
error_file=error_file,
99-
output_test=output_test,
100-
output_code=output_code,
101-
output_results=output_results,
102-
loop=loop,
103-
verification_program=verification_program,
104-
max_attempts=max_attempts,
105-
budget=budget,
106-
auto_submit=auto_submit,
107-
agentic_fallback=agentic_fallback,
108-
)
96+
all_results: List[Dict[str, Any]] = []
97+
total_cost = 0.0
98+
model_name = ""
99+
100+
# Process each unit test file separately
101+
for unit_test_file in unit_test_files:
102+
success, fixed_unit_test, fixed_code, attempts, cost, model = fix_main(
103+
ctx=ctx,
104+
prompt_file=prompt_file,
105+
code_file=code_file,
106+
unit_test_file=unit_test_file,
107+
error_file=error_file,
108+
output_test=output_test,
109+
output_code=output_code,
110+
output_results=output_results,
111+
loop=loop,
112+
verification_program=verification_program,
113+
max_attempts=max_attempts,
114+
budget=budget,
115+
auto_submit=auto_submit,
116+
agentic_fallback=agentic_fallback,
117+
)
118+
all_results.append({
119+
"success": success,
120+
"fixed_unit_test": fixed_unit_test,
121+
"fixed_code": fixed_code,
122+
"attempts": attempts,
123+
"unit_test_file": unit_test_file,
124+
})
125+
total_cost += cost
126+
model_name = model
127+
128+
# Aggregate results
129+
overall_success = all(r["success"] for r in all_results)
109130
result = {
110-
"success": success,
111-
"fixed_unit_test": fixed_unit_test,
112-
"fixed_code": fixed_code,
113-
"attempts": attempts,
131+
"success": overall_success,
132+
"results": all_results,
133+
"total_attempts": sum(r["attempts"] for r in all_results),
114134
}
115135
return result, total_cost, model_name
116136
except click.Abort:

pdd/commands/generate.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,8 @@ def example(
206206
@click.option(
207207
"--existing-tests",
208208
type=click.Path(exists=True, dir_okay=False),
209-
default=None,
210-
help="Path to the existing unit test file.",
209+
multiple=True,
210+
help="Path to existing unit test file(s). Can be specified multiple times.",
211211
)
212212
@click.option(
213213
"--target-coverage",
@@ -230,20 +230,22 @@ def test(
230230
output: Optional[str],
231231
language: Optional[str],
232232
coverage_report: Optional[str],
233-
existing_tests: Optional[str],
233+
existing_tests: Tuple[str, ...],
234234
target_coverage: Optional[float],
235235
merge: bool,
236236
) -> Optional[Tuple[str, float, str]]:
237237
"""Generate unit tests for a given prompt and implementation."""
238238
try:
239+
# Convert empty tuple to None for cmd_test_main compatibility
240+
existing_tests_list = list(existing_tests) if existing_tests else None
239241
test_code, total_cost, model_name = cmd_test_main(
240242
ctx=ctx,
241243
prompt_file=prompt_file,
242244
code_file=code_file,
243245
output=output,
244246
language=language,
245247
coverage_report=coverage_report,
246-
existing_tests=existing_tests,
248+
existing_tests=existing_tests_list,
247249
target_coverage=target_coverage,
248250
merge=merge,
249251
)

pdd/construct_paths.py

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,24 @@ def _extract_basename(
271271
"""
272272
Deduce the project basename according to the rules explained in *Step A*.
273273
"""
274+
# Handle 'fix' command specifically to create a unique basename per test file
275+
if command == "fix":
276+
prompt_path = _candidate_prompt_path(input_file_paths)
277+
if not prompt_path:
278+
raise ValueError("Could not determine prompt file for 'fix' command.")
279+
280+
prompt_basename = _strip_language_suffix(prompt_path)
281+
282+
unit_test_path = input_file_paths.get("unit_test_file")
283+
if not unit_test_path:
284+
# Fallback to just the prompt basename if no unit test file is provided
285+
# This might happen in some edge cases, but 'fix' command structure requires it
286+
return prompt_basename
287+
288+
# Use the stem of the unit test file to make the basename unique
289+
test_basename = Path(unit_test_path).stem
290+
return f"{prompt_basename}_{test_basename}"
291+
274292
# Handle conflicts first due to its unique structure
275293
if command == "conflicts":
276294
key1 = "prompt1"
@@ -778,12 +796,25 @@ def construct_paths(
778796
raise # Re-raise the ValueError
779797

780798
# ------------- Step 4: overwrite confirmation ------------
781-
# Check if any output *file* exists (operate on Path objects)
799+
# Initialize existing_files before the conditional to avoid UnboundLocalError
782800
existing_files: Dict[str, Path] = {}
783-
for k, p_obj in output_paths_resolved.items():
784-
# p_obj = Path(p_val) # Conversion now happens earlier
785-
if p_obj.is_file():
786-
existing_files[k] = p_obj # Store the Path object
801+
802+
if command in ["test", "bug"] and not force:
803+
# For test/bug commands without --force, create numbered files instead of overwriting
804+
for key, path in output_paths_resolved.items():
805+
if path.is_file():
806+
base, ext = os.path.splitext(path)
807+
i = 1
808+
new_path = Path(f"{base}_{i}{ext}")
809+
while new_path.exists():
810+
i += 1
811+
new_path = Path(f"{base}_{i}{ext}")
812+
output_paths_resolved[key] = new_path
813+
else:
814+
# Check if any output *file* exists (operate on Path objects)
815+
for k, p_obj in output_paths_resolved.items():
816+
if p_obj.is_file():
817+
existing_files[k] = p_obj # Store the Path object
787818

788819
if existing_files and not force:
789820
paths_list = "\n".join(f" • {p.resolve()}" for p in existing_files.values())

pdd/fix_errors_from_unit_tests.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,8 @@ def fix_errors_from_unit_tests(
114114
Fix errors in unit tests using LLM models and log the process.
115115
116116
Args:
117-
unit_test (str): The unit test code
117+
unit_test (str): The unit test code, potentially multiple files concatenated
118+
with <file name="filename.py">...</file> tags.
118119
code (str): The code under test
119120
prompt (str): The prompt that generated the code
120121
error (str): The error message

0 commit comments

Comments
 (0)