Skip to content

Commit 1f4de91

Browse files
committed
Bump version
1 parent c7d113c commit 1f4de91

21 files changed

+834
-109
lines changed

CHANGELOG.md

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,67 @@
1+
## v0.0.87 (2025-12-18)
2+
13
## v0.0.86 (2025-12-17)
24

35
### Feat
46

5-
- add metadata files for Python test command execution results and configuration
6-
- add encode_message prompt for encoding functionality in Python
7-
- add encode_message prompt for regression tests and enhance test auto-discovery in integration tests
8-
- enhance LLM prompts with detailed mock vs production code guidance and improve integration test script for clarity
9-
- add integration and static tests for mock vs production code guidance in LLM prompts
10-
- enhance unit test inclusion in code generation, implement example error detection, and improve directory summarization; update README and tests accordingly
7+
- **`--dry-run` Flag for Sync Command:** Renamed the `--log` flag to `--dry-run` for clearer semantics. The `--dry-run` flag analyzes sync state without executing operations, showing what sync would do. The old `--log` flag is deprecated with a warning directing users to use `--dry-run` instead.
8+
9+
- **Mock vs Production Code Guidance in LLM Prompts:** Added comprehensive guidance to `fix_verification_errors_LLM.prompt` and `find_verification_errors_LLM.prompt` for distinguishing mock configuration errors from production code errors. Prompts now instruct the LLM to:
10+
- Identify test files using mocks (MagicMock, unittest.mock, patch)
11+
- Check mock setup FIRST when errors occur (wrong `return_value` structure, missing `__getitem__` configuration)
12+
- Preserve production code API usage patterns unless documentation proves otherwise
13+
- Follow a diagnosis priority: mock configuration → mock chaining → production code
14+
15+
- **Unit Test Auto-Discovery Regression Test:** Added regression test #20 to `tests/regression.sh` that validates the `generate` command's unit test auto-discovery feature. Tests both `--exclude-tests` mode (no context, expects failure) and default auto-discovery mode (expects success).
16+
17+
- **Encode Message Prompt:** Added `prompts/encode_message_python.prompt` as a simple prompt for testing unit test auto-discovery and regression test scenarios.
1118

1219
### Fix
1320

14-
- improve verification success tracking in error fixing loop and update related tests
15-
- add run_attempt to branch name for re-run support
21+
- **Verification Success Tracking Bug:** Fixed a critical bug in `fix_verification_errors_loop` where the function incorrectly reported "No improvement found" when secondary verification passed but the issue count didn't decrease. Added `any_verification_passed` flag that tracks when code was actually changed AND secondary verification passed. The function now correctly returns `success=True` when verification passes, even if the LLM's issue count assessment is unchanged. This ensures code that compiles and runs correctly is recognized as successful. Key changes:
22+
- Track `any_verification_passed` separately from best iteration tracking
23+
- Only set flag when `code_updated=True` AND verification passes
24+
- Return `success=True` with `final_issues=0` when verification passed
1625

1726
### Refactor
1827

19-
- remove unused warnings import from maintenance commands
28+
- **Remove Unused Warnings Import:** Cleaned up unused `warnings` import from `pdd/commands/maintenance.py`.
29+
30+
- **Error Fixing Loop Prompt Simplification:** Streamlined `prompts/fix_verification_errors_loop_python.prompt` from 123 lines to 63 lines by:
31+
- Condensing implementation details into "behavior defined by test suite" directive
32+
- Listing key behaviors to implement without step-by-step instructions
33+
- Focusing on inputs/outputs and test compliance
34+
35+
### Docs
36+
37+
- **Prompting Guide Major Update:** Significantly expanded `docs/prompting_guide.md` with ~200 lines of new content:
38+
- **Automated Grounding (PDD Cloud):** Explains how vector embedding and similarity search automatically provides few-shot examples during generation
39+
- **Grounding Overrides:** Documents `<pin>module_name</pin>` and `<exclude>module_name</exclude>` tags for controlling automatic example retrieval
40+
- **Three Pillars of PDD Generation:** New section explaining how Prompt (WHAT), Grounding (HOW), and Tests (CORRECTNESS) work together
41+
- **Prompt Abstraction Guidance:** Added 10-30% prompt-to-code ratio target with clear guidelines on what NOT to include in prompts
42+
- **Non-Deterministic Tag Warnings:** Added explicit warnings about `<shell>` and `<web>` tags introducing environment-dependent behavior
43+
- **Requirements Writing Guide:** Expanded with before/after examples and testability criteria
44+
45+
### Tests
46+
47+
- Added 320+ lines of verification loop tests in `tests/test_fix_verification_errors_loop.py` covering:
48+
- Verification passes but issue count unchanged (regression test for the bug)
49+
- Best iteration restored with verification passed
50+
- Proper `any_verification_passed` flag behavior
51+
- Success determination based on verification outcome vs issue count
52+
53+
- Added 130+ lines of maintenance command tests in `tests/test_commands_maintenance.py` covering:
54+
- `@track_cost` decorator verification for sync and auto-deps commands
55+
- Deprecated `--log` flag warning emission and `dry_run=True` propagation
56+
- `click.Abort` re-raising (not caught by generic error handlers)
57+
- Error handling with correct arguments to `handle_error`
58+
- `ctx.obj=None` graceful handling in setup command
59+
60+
- Added 68 lines of static prompt tests in `tests/test_mock_vs_production_fix.py` verifying:
61+
- `fix_verification_errors_LLM.prompt` contains mock guidance section, mentions MagicMock, `__getitem__` pattern, and prioritizes mock fixes
62+
- `find_verification_errors_LLM.prompt` has mock identification step
63+
64+
- Added 154-line integration test script `tests/test_mock_fix_integration.sh` for validating LLM behavior with mock vs production code scenarios
2065

2166
## v0.0.85 (2025-12-16)
2267

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# PDD (Prompt-Driven Development) Command Line Interface
22

3-
![PDD-CLI Version](https://img.shields.io/badge/pdd--cli-v0.0.86-blue) [![Discord](https://img.shields.io/badge/Discord-join%20chat-7289DA.svg?logo=discord&logoColor=white)](https://discord.gg/Yp4RTh8bG7)
3+
![PDD-CLI Version](https://img.shields.io/badge/pdd--cli-v0.0.87-blue) [![Discord](https://img.shields.io/badge/Discord-join%20chat-7289DA.svg?logo=discord&logoColor=white)](https://discord.gg/Yp4RTh8bG7)
44

55
## Introduction
66

@@ -285,7 +285,7 @@ export PDD_TEST_OUTPUT_PATH=/path/to/tests/
285285

286286
## Version
287287

288-
Current version: 0.0.86
288+
Current version: 0.0.87
289289

290290
To check your installed version, run:
291291
```

SETUP_WITH_GEMINI.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This example shows you how to set up **Prompt-Driven Development (PDD)** with a free **Gemini API key** and run the built-in **Hello** example.
44

5-
> **Goal:** By the end, youll have PDD installed, Gemini configured, and `pdd generate` running on the Hello example.
5+
> **Goal:** By the end, you'll have PDD installed, Gemini configured, and `pdd sync` running on the Hello example.
66
77
---
88

@@ -82,11 +82,13 @@ cd pdd/examples/hello
8282

8383
If you already pasted the key into `pdd setup`, you can skip this section. Otherwise:
8484

85-
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey).
86-
2. Log in with your Google account.
87-
3. Click **Create API key**.
85+
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey).
86+
2. Log in with your Google account.
87+
3. Click **Create API key**.
8888
4. Copy the key.
8989

90+
> **Students:** The Gemini API is free for everyone, but university students get higher rate limits (60 requests/min, 300K tokens/day) extended through June 2026. You can also claim [1 year of Google AI Pro free](https://gemini.google/students/) (sign up by Jan 31, 2026) for additional perks like NotebookLM and 2TB storage.
91+
9092
**macOS/Linux (bash/zsh)**
9193
```bash
9294
export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
@@ -122,10 +124,9 @@ head -2 ~/.pdd/llm_model.csv
122124

123125
---
124126

125-
## 6. Output locations (tests & examples)
127+
## 6. Output locations (optional, skip for this quickstart)
126128

127-
By default, PDD writes generated files next to your source code.
128-
To keep repos tidy, set these environment variables once (e.g., in `~/.zshrc` or `~/.bashrc`):
129+
By default, PDD writes generated files next to your source code. For real projects, you can set these environment variables to organize outputs:
129130

130131
```bash
131132
export PDD_TEST_OUTPUT_PATH=tests
@@ -136,36 +137,52 @@ With these set, PDD will place outputs like so:
136137
- Examples → `examples/<module>/...`
137138
- Tests → `tests/<module>/...`
138139

140+
> **Note:** For the Hello example below, leave these unset so files generate in the current directory.
141+
139142
---
140143

141-
## 7. Run the Hello Example
144+
## 7. Validate Your Setup
145+
146+
Before using the main workflow, verify your configuration works by running a quick generate:
142147

143148
From `pdd/examples/hello`:
144149

145150
```bash
146-
# generate code from the prompt
147151
pdd generate hello_python.prompt
152+
```
153+
154+
If this succeeds, your API key and model configuration are working correctly.
155+
156+
---
157+
158+
## 8. Use Sync (Primary Workflow)
159+
160+
The `pdd sync` command is the primary way to work with PDD. It generates code, tests, and examples for a module, keeping everything in sync:
148161

149-
# run the generated example if it has a main block
150-
python examples/hello/hello.py
162+
```bash
163+
pdd sync hello
151164
```
152165

153-
If the generated `hello.py` is minimal (no `__main__` block), run it interactively:
166+
Use `--force` to regenerate even if files already exist:
154167

155168
```bash
156-
python -i examples/hello/hello.py
157-
>>> hello()
158-
hello
169+
pdd --force sync hello
159170
```
160171

161-
---
162-
## 8. (Optional) Sync
172+
After syncing, run the generated example:
163173

164-
After you’ve confirmed `generate` works:
174+
```bash
175+
python hello.py
176+
```
177+
178+
If the generated `hello.py` is minimal (no `__main__` block), run it interactively:
165179

166180
```bash
167-
pdd --force sync hello
181+
python -i hello.py
182+
>>> hello()
183+
hello
168184
```
185+
169186
---
170187

171188
## 9. What if nothing prints?
@@ -181,7 +198,7 @@ In that case you have two options:
181198

182199
### Option A — Run interactively
183200
```bash
184-
python -i examples/hello/hello.py
201+
python -i hello.py
185202
>>> hello()
186203
hello
187204
```
@@ -194,10 +211,10 @@ if __name__ == "__main__":
194211
```
195212
Then re-run:
196213
```bash
197-
python examples/hello/hello.py
214+
python hello.py
198215
# output:
199216
hello
200217
```
201218

202219

203-
✅ Thats it! Youve installed PDD, configured Gemini, set up the model CSV, and generated your first working example.
220+
✅ That's it! You've installed PDD, configured Gemini, and used `pdd sync` to generate your first module.

examples/qrcode_sandwich/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Trim your `llm_model.csv` accordingly to the models you have. If you only have *
9292
```csv
9393
provider,model,input,output,coding_arena_elo,base_url,api_key,max_reasoning_tokens,structured_output,reasoning_type
9494
Google,gpt-4.1-nano,0.1,0.4,1249,,OPENAI_API_KEY,0,True,none
95-
Google,gemini/gemini-2.5-flash,0.15,0.6,1330,,GEMINI_API_KEY,0,True,effort
95+
Google,gemini/gemini-3-flash-preview,0.15,0.6,1330,,GEMINI_API_KEY,0,True,effort
9696
```
9797

9898
Note: I have **GPT 4.1 Nano** included because it is my default model. However, you can set an env variable to have a different model as default.

pdd/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""PDD - Prompt Driven Development"""
22

3-
__version__ = "0.0.86"
3+
__version__ = "0.0.87"
44

55
# Strength parameter used for LLM extraction across the codebase
66
# Used in postprocessing, XML tagging, code generation, and other extraction

pdd/change_main.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@
1717
from rich.panel import Panel
1818

1919
# Use relative imports for internal modules
20+
from .config_resolution import resolve_effective_config
2021
from .construct_paths import construct_paths
2122
from .change import change as change_func
2223
from .process_csv_change import process_csv_change
2324
from .get_extension import get_extension
24-
from . import DEFAULT_STRENGTH, DEFAULT_TIME
2525

2626
# Set up logging
2727
logger = logging.getLogger(__name__)
@@ -72,9 +72,8 @@ def change_main(
7272
# Retrieve global options from context
7373
force: bool = ctx.obj.get("force", False)
7474
quiet: bool = ctx.obj.get("quiet", False)
75-
strength: float = ctx.obj.get("strength", DEFAULT_STRENGTH)
76-
temperature: float = ctx.obj.get("temperature", 0.0)
77-
time_budget: float = ctx.obj.get("time", DEFAULT_TIME)
75+
# Note: strength/temperature/time will be resolved after construct_paths
76+
# using resolve_effective_config for proper priority handling
7877
# --- Get language and extension from context ---
7978
# These are crucial for knowing the target code file types, especially in CSV mode
8079
target_language: str = ctx.obj.get("language", "")
@@ -216,6 +215,13 @@ def change_main(
216215
logger.error(msg, exc_info=True)
217216
return msg, 0.0, ""
218217

218+
# Use centralized config resolution with proper priority:
219+
# CLI > pddrc > defaults
220+
effective_config = resolve_effective_config(ctx, resolved_config)
221+
strength = effective_config["strength"]
222+
temperature = effective_config["temperature"]
223+
time_budget = effective_config["time"]
224+
219225
# --- 3. Perform Prompt Modification ---
220226
if use_csv:
221227
logger.info("Running in CSV mode.")

pdd/cmd_test_main.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# pylint: disable=redefined-builtin
88
from rich import print
99

10-
from . import DEFAULT_STRENGTH, DEFAULT_TEMPERATURE
10+
from .config_resolution import resolve_effective_config
1111
from .construct_paths import construct_paths
1212
from .generate_test import generate_test
1313
from .increase_tests import increase_tests
@@ -55,9 +55,9 @@ def cmd_test_main(
5555
input_strings = {}
5656

5757
verbose = ctx.obj["verbose"]
58-
strength = strength if strength is not None else ctx.obj.get("strength", DEFAULT_STRENGTH)
59-
temperature = temperature if temperature is not None else ctx.obj.get("temperature", DEFAULT_TEMPERATURE)
60-
time = ctx.obj.get("time")
58+
# Note: strength/temperature will be resolved after construct_paths using resolve_effective_config
59+
param_strength = strength # Store the parameter value for later resolution
60+
param_temperature = temperature # Store the parameter value for later resolution
6161

6262
if verbose:
6363
print(f"[bold blue]Prompt file:[/bold blue] {prompt_file}")
@@ -94,6 +94,16 @@ def cmd_test_main(
9494
context_override=ctx.obj.get('context'),
9595
confirm_callback=ctx.obj.get('confirm_callback')
9696
)
97+
# Use centralized config resolution with proper priority:
98+
# CLI > pddrc > defaults
99+
effective_config = resolve_effective_config(
100+
ctx,
101+
resolved_config,
102+
param_overrides={"strength": param_strength, "temperature": param_temperature}
103+
)
104+
strength = effective_config["strength"]
105+
temperature = effective_config["temperature"]
106+
time = effective_config["time"]
97107
except click.Abort:
98108
# User cancelled - re-raise to stop the sync loop
99109
raise

pdd/commands/maintenance.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@
3737
)
3838
@click.option(
3939
"--target-coverage",
40-
default=0.0,
41-
help="Desired code coverage percentage.",
40+
default=None,
41+
help="Desired code coverage percentage. Default: 10.0 or .pddrc value.",
4242
)
4343
@click.option(
4444
"--dry-run",

pdd/config_resolution.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
"""
2+
Centralized config resolution for all commands.
3+
4+
Single source of truth for resolving strength, temperature, and other config values.
5+
This module ensures consistent priority ordering across all commands:
6+
1. CLI global options (--strength, --temperature) - highest priority
7+
2. pddrc context defaults - medium priority
8+
3. Hardcoded defaults - lowest priority
9+
"""
10+
from typing import Dict, Any, Optional
11+
import click
12+
13+
from . import DEFAULT_STRENGTH, DEFAULT_TEMPERATURE, DEFAULT_TIME
14+
15+
16+
def resolve_effective_config(
17+
ctx: click.Context,
18+
resolved_config: Dict[str, Any],
19+
param_overrides: Optional[Dict[str, Any]] = None
20+
) -> Dict[str, Any]:
21+
"""
22+
Resolve effective config values with proper priority.
23+
24+
Priority (highest to lowest):
25+
1. Command parameter overrides (e.g., strength kwarg)
26+
2. CLI global options (--strength stored in ctx.obj)
27+
3. pddrc context defaults (from resolved_config)
28+
4. Hardcoded defaults
29+
30+
Args:
31+
ctx: Click context with CLI options in ctx.obj
32+
resolved_config: Config returned by construct_paths (contains pddrc values)
33+
param_overrides: Optional command-specific parameter overrides
34+
35+
Returns:
36+
Dict with resolved values for strength, temperature, time
37+
"""
38+
ctx_obj = ctx.obj if ctx.obj else {}
39+
param_overrides = param_overrides or {}
40+
41+
def resolve_value(key: str, default: Any) -> Any:
42+
# Priority 1: Command parameter override
43+
if key in param_overrides and param_overrides[key] is not None:
44+
return param_overrides[key]
45+
# Priority 2: CLI global option (only if key IS in ctx.obj - meaning CLI passed it)
46+
if key in ctx_obj:
47+
return ctx_obj[key]
48+
# Priority 3: pddrc context default
49+
if key in resolved_config and resolved_config[key] is not None:
50+
return resolved_config[key]
51+
# Priority 4: Hardcoded default
52+
return default
53+
54+
return {
55+
"strength": resolve_value("strength", DEFAULT_STRENGTH),
56+
"temperature": resolve_value("temperature", DEFAULT_TEMPERATURE),
57+
"time": resolve_value("time", DEFAULT_TIME),
58+
}

0 commit comments

Comments
 (0)