promptdriven
diff --git a/‎CHANGELOG.md‎
Lines changed: 54 additions & 9 deletions b/‎CHANGELOG.md‎
Lines changed: 54 additions & 9 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎SETUP_WITH_GEMINI.md‎
Lines changed: 39 additions & 22 deletions b/‎SETUP_WITH_GEMINI.md‎
Lines changed: 39 additions & 22 deletions
diff --git a/‎examples/qrcode_sandwich/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/qrcode_sandwich/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pdd/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎pdd/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pdd/change_main.py‎
Lines changed: 10 additions & 4 deletions b/‎pdd/change_main.py‎
Lines changed: 10 additions & 4 deletions
diff --git a/‎pdd/cmd_test_main.py‎
Lines changed: 14 additions & 4 deletions b/‎pdd/cmd_test_main.py‎
Lines changed: 14 additions & 4 deletions
diff --git a/‎pdd/commands/maintenance.py‎
Lines changed: 2 additions & 2 deletions b/‎pdd/commands/maintenance.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pdd/config_resolution.py‎
Lines changed: 58 additions & 0 deletions b/‎pdd/config_resolution.py‎
Lines changed: 58 additions & 0 deletions
@@ -1,22 +1,67 @@
+## v0.0.87 (2025-12-18)
+
 ## v0.0.86 (2025-12-17)
 
 ### Feat
 
-- add metadata files for Python test command execution results and configuration
-- add encode_message prompt for encoding functionality in Python
-- add encode_message prompt for regression tests and enhance test auto-discovery in integration tests
-- enhance LLM prompts with detailed mock vs production code guidance and improve integration test script for clarity
-- add integration and static tests for mock vs production code guidance in LLM prompts
-- enhance unit test inclusion in code generation, implement example error detection, and improve directory summarization; update README and tests accordingly
+- **`--dry-run` Flag for Sync Command:** Renamed the `--log` flag to `--dry-run` for clearer semantics. The `--dry-run` flag analyzes sync state without executing operations, showing what sync would do. The old `--log` flag is deprecated with a warning directing users to use `--dry-run` instead.
+
+- **Mock vs Production Code Guidance in LLM Prompts:** Added comprehensive guidance to `fix_verification_errors_LLM.prompt` and `find_verification_errors_LLM.prompt` for distinguishing mock configuration errors from production code errors. Prompts now instruct the LLM to:
+  - Identify test files using mocks (MagicMock, unittest.mock, patch)
+  - Check mock setup FIRST when errors occur (wrong `return_value` structure, missing `__getitem__` configuration)
+  - Preserve production code API usage patterns unless documentation proves otherwise
+  - Follow a diagnosis priority: mock configuration → mock chaining → production code
+
+- **Unit Test Auto-Discovery Regression Test:** Added regression test #20 to `tests/regression.sh` that validates the `generate` command's unit test auto-discovery feature. Tests both `--exclude-tests` mode (no context, expects failure) and default auto-discovery mode (expects success).
+
+- **Encode Message Prompt:** Added `prompts/encode_message_python.prompt` as a simple prompt for testing unit test auto-discovery and regression test scenarios.
 
 ### Fix
 
-- improve verification success tracking in error fixing loop and update related tests
-- add run_attempt to branch name for re-run support
+- **Verification Success Tracking Bug:** Fixed a critical bug in `fix_verification_errors_loop` where the function incorrectly reported "No improvement found" when secondary verification passed but the issue count didn't decrease. Added `any_verification_passed` flag that tracks when code was actually changed AND secondary verification passed. The function now correctly returns `success=True` when verification passes, even if the LLM's issue count assessment is unchanged. This ensures code that compiles and runs correctly is recognized as successful. Key changes:
+  - Track `any_verification_passed` separately from best iteration tracking
+  - Only set flag when `code_updated=True` AND verification passes
+  - Return `success=True` with `final_issues=0` when verification passed
 
 ### Refactor
 
-- remove unused warnings import from maintenance commands
+- **Remove Unused Warnings Import:** Cleaned up unused `warnings` import from `pdd/commands/maintenance.py`.
+
+- **Error Fixing Loop Prompt Simplification:** Streamlined `prompts/fix_verification_errors_loop_python.prompt` from 123 lines to 63 lines by:
+  - Condensing implementation details into "behavior defined by test suite" directive
+  - Listing key behaviors to implement without step-by-step instructions
+  - Focusing on inputs/outputs and test compliance
+
+### Docs
+
+- **Prompting Guide Major Update:** Significantly expanded `docs/prompting_guide.md` with ~200 lines of new content:
+  - **Automated Grounding (PDD Cloud):** Explains how vector embedding and similarity search automatically provides few-shot examples during generation
+  - **Grounding Overrides:** Documents `<pin>module_name</pin>` and `<exclude>module_name</exclude>` tags for controlling automatic example retrieval
+  - **Three Pillars of PDD Generation:** New section explaining how Prompt (WHAT), Grounding (HOW), and Tests (CORRECTNESS) work together
+  - **Prompt Abstraction Guidance:** Added 10-30% prompt-to-code ratio target with clear guidelines on what NOT to include in prompts
+  - **Non-Deterministic Tag Warnings:** Added explicit warnings about `<shell>` and `<web>` tags introducing environment-dependent behavior
+  - **Requirements Writing Guide:** Expanded with before/after examples and testability criteria
+
+### Tests
+
+- Added 320+ lines of verification loop tests in `tests/test_fix_verification_errors_loop.py` covering:
+  - Verification passes but issue count unchanged (regression test for the bug)
+  - Best iteration restored with verification passed
+  - Proper `any_verification_passed` flag behavior
+  - Success determination based on verification outcome vs issue count
+
+- Added 130+ lines of maintenance command tests in `tests/test_commands_maintenance.py` covering:
+  - `@track_cost` decorator verification for sync and auto-deps commands
+  - Deprecated `--log` flag warning emission and `dry_run=True` propagation
+  - `click.Abort` re-raising (not caught by generic error handlers)
+  - Error handling with correct arguments to `handle_error`
+  - `ctx.obj=None` graceful handling in setup command
+
+- Added 68 lines of static prompt tests in `tests/test_mock_vs_production_fix.py` verifying:
+  - `fix_verification_errors_LLM.prompt` contains mock guidance section, mentions MagicMock, `__getitem__` pattern, and prioritizes mock fixes
+  - `find_verification_errors_LLM.prompt` has mock identification step
+
+- Added 154-line integration test script `tests/test_mock_fix_integration.sh` for validating LLM behavior with mock vs production code scenarios
 
 ## v0.0.85 (2025-12-16)
 
 
@@ -1,6 +1,6 @@
 # PDD (Prompt-Driven Development) Command Line Interface
 
-![PDD-CLI Version](https://img.shields.io/badge/pdd--cli-v0.0.86-blue) [![Discord](https://img.shields.io/badge/Discord-join%20chat-7289DA.svg?logo=discord&logoColor=white)](https://discord.gg/Yp4RTh8bG7)
+![PDD-CLI Version](https://img.shields.io/badge/pdd--cli-v0.0.87-blue) [![Discord](https://img.shields.io/badge/Discord-join%20chat-7289DA.svg?logo=discord&logoColor=white)](https://discord.gg/Yp4RTh8bG7)
 
 ## Introduction
 
@@ -285,7 +285,7 @@ export PDD_TEST_OUTPUT_PATH=/path/to/tests/
 
 ## Version
 
-Current version: 0.0.86
+Current version: 0.0.87
 
 To check your installed version, run:
 ```
 
@@ -2,7 +2,7 @@
 
 This example shows you how to set up **Prompt-Driven Development (PDD)** with a free **Gemini API key** and run the built-in **Hello** example.
 
-> **Goal:** By the end, you’ll have PDD installed, Gemini configured, and `pdd generate` running on the Hello example.
+> **Goal:** By the end, you'll have PDD installed, Gemini configured, and `pdd sync` running on the Hello example.
 
 ---
 
@@ -82,11 +82,13 @@ cd pdd/examples/hello
 
 If you already pasted the key into `pdd setup`, you can skip this section. Otherwise:
 
-1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey).  
-2. Log in with your Google account.  
-3. Click **Create API key**.  
+1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey).
+2. Log in with your Google account.
+3. Click **Create API key**.
 4. Copy the key.
 
+> **Students:** The Gemini API is free for everyone, but university students get higher rate limits (60 requests/min, 300K tokens/day) extended through June 2026. You can also claim [1 year of Google AI Pro free](https://gemini.google/students/) (sign up by Jan 31, 2026) for additional perks like NotebookLM and 2TB storage.
+
 **macOS/Linux (bash/zsh)**
 ```bash
 export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
@@ -122,10 +124,9 @@ head -2 ~/.pdd/llm_model.csv
 
 ---
 
-## 6. Output locations (tests & examples)
+## 6. Output locations (optional, skip for this quickstart)
 
-By default, PDD writes generated files next to your source code.  
-To keep repos tidy, set these environment variables once (e.g., in `~/.zshrc` or `~/.bashrc`):
+By default, PDD writes generated files next to your source code. For real projects, you can set these environment variables to organize outputs:
 
 ```bash
 export PDD_TEST_OUTPUT_PATH=tests
@@ -136,36 +137,52 @@ With these set, PDD will place outputs like so:
 - Examples → `examples/<module>/...`
 - Tests → `tests/<module>/...`
 
+> **Note:** For the Hello example below, leave these unset so files generate in the current directory.
+
 ---
 
-## 7. Run the Hello Example
+## 7. Validate Your Setup
+
+Before using the main workflow, verify your configuration works by running a quick generate:
 
 From `pdd/examples/hello`:
 
 ```bash
-# generate code from the prompt
 pdd generate hello_python.prompt
+```
+
+If this succeeds, your API key and model configuration are working correctly.
+
+---
+
+## 8. Use Sync (Primary Workflow)
+
+The `pdd sync` command is the primary way to work with PDD. It generates code, tests, and examples for a module, keeping everything in sync:
 
-# run the generated example if it has a main block
-python examples/hello/hello.py
+```bash
+pdd sync hello
 ```
 
-If the generated `hello.py` is minimal (no `__main__` block), run it interactively:
+Use `--force` to regenerate even if files already exist:
 
 ```bash
-python -i examples/hello/hello.py
->>> hello()
-hello
+pdd --force sync hello
 ```
 
----
-## 8. (Optional) Sync
+After syncing, run the generated example:
 
-After you’ve confirmed `generate` works:
+```bash
+python hello.py
+```
+
+If the generated `hello.py` is minimal (no `__main__` block), run it interactively:
 
 ```bash
-pdd --force sync hello
+python -i hello.py
+>>> hello()
+hello
 ```
+
 ---
 
 ## 9. What if nothing prints?
@@ -181,7 +198,7 @@ In that case you have two options:
 
 ### Option A — Run interactively
 ```bash
-python -i examples/hello/hello.py
+python -i hello.py
 >>> hello()
 hello
 ```
@@ -194,10 +211,10 @@ if __name__ == "__main__":
 ```
 Then re-run:
 ```bash
-python examples/hello/hello.py
+python hello.py
 # output:
 hello
 ```
 
 
-✅ That’s it! You’ve installed PDD, configured Gemini, set up the model CSV, and generated your first working example.
+✅ That's it! You've installed PDD, configured Gemini, and used `pdd sync` to generate your first module.
@@ -92,7 +92,7 @@ Trim your `llm_model.csv` accordingly to the models you have. If you only have *
 ```csv
 provider,model,input,output,coding_arena_elo,base_url,api_key,max_reasoning_tokens,structured_output,reasoning_type
 Google,gpt-4.1-nano,0.1,0.4,1249,,OPENAI_API_KEY,0,True,none
-Google,gemini/gemini-2.5-flash,0.15,0.6,1330,,GEMINI_API_KEY,0,True,effort
+Google,gemini/gemini-3-flash-preview,0.15,0.6,1330,,GEMINI_API_KEY,0,True,effort
 ```
 
 Note: I have **GPT 4.1 Nano** included because it is my default model. However, you can set an env variable to have a different model as default.
 
@@ -1,6 +1,6 @@
 """PDD - Prompt Driven Development"""
 
-__version__ = "0.0.86"
+__version__ = "0.0.87"
 
 # Strength parameter used for LLM extraction across the codebase
 # Used in postprocessing, XML tagging, code generation, and other extraction
 
@@ -17,11 +17,11 @@
 from rich.panel import Panel
 
 # Use relative imports for internal modules
+from .config_resolution import resolve_effective_config
 from .construct_paths import construct_paths
 from .change import change as change_func
 from .process_csv_change import process_csv_change
 from .get_extension import get_extension
-from . import DEFAULT_STRENGTH, DEFAULT_TIME
 
 # Set up logging
 logger = logging.getLogger(__name__)
@@ -72,9 +72,8 @@ def change_main(
     # Retrieve global options from context
     force: bool = ctx.obj.get("force", False)
     quiet: bool = ctx.obj.get("quiet", False)
-    strength: float = ctx.obj.get("strength", DEFAULT_STRENGTH)
-    temperature: float = ctx.obj.get("temperature", 0.0)
-    time_budget: float = ctx.obj.get("time", DEFAULT_TIME)
+    # Note: strength/temperature/time will be resolved after construct_paths
+    # using resolve_effective_config for proper priority handling
     # --- Get language and extension from context ---
     # These are crucial for knowing the target code file types, especially in CSV mode
     target_language: str = ctx.obj.get("language", "")
@@ -216,6 +215,13 @@ def change_main(
             logger.error(msg, exc_info=True)
             return msg, 0.0, ""
 
+        # Use centralized config resolution with proper priority:
+        # CLI > pddrc > defaults
+        effective_config = resolve_effective_config(ctx, resolved_config)
+        strength = effective_config["strength"]
+        temperature = effective_config["temperature"]
+        time_budget = effective_config["time"]
+
         # --- 3. Perform Prompt Modification ---
         if use_csv:
             logger.info("Running in CSV mode.")
 
@@ -7,7 +7,7 @@
 # pylint: disable=redefined-builtin
 from rich import print
 
-from . import DEFAULT_STRENGTH, DEFAULT_TEMPERATURE
+from .config_resolution import resolve_effective_config
 from .construct_paths import construct_paths
 from .generate_test import generate_test
 from .increase_tests import increase_tests
@@ -55,9 +55,9 @@ def cmd_test_main(
     input_strings = {}
 
     verbose = ctx.obj["verbose"]
-    strength = strength if strength is not None else ctx.obj.get("strength", DEFAULT_STRENGTH)
-    temperature = temperature if temperature is not None else ctx.obj.get("temperature", DEFAULT_TEMPERATURE)
-    time = ctx.obj.get("time")
+    # Note: strength/temperature will be resolved after construct_paths using resolve_effective_config
+    param_strength = strength  # Store the parameter value for later resolution
+    param_temperature = temperature  # Store the parameter value for later resolution
 
     if verbose:
         print(f"[bold blue]Prompt file:[/bold blue] {prompt_file}")
@@ -94,6 +94,16 @@ def cmd_test_main(
             context_override=ctx.obj.get('context'),
             confirm_callback=ctx.obj.get('confirm_callback')
         )
+        # Use centralized config resolution with proper priority:
+        # CLI > pddrc > defaults
+        effective_config = resolve_effective_config(
+            ctx,
+            resolved_config,
+            param_overrides={"strength": param_strength, "temperature": param_temperature}
+        )
+        strength = effective_config["strength"]
+        temperature = effective_config["temperature"]
+        time = effective_config["time"]
     except click.Abort:
         # User cancelled - re-raise to stop the sync loop
         raise
 
@@ -37,8 +37,8 @@
 )
 @click.option(
     "--target-coverage",
-    default=0.0,
-    help="Desired code coverage percentage.",
+    default=None,
+    help="Desired code coverage percentage. Default: 10.0 or .pddrc value.",
 )
 @click.option(
     "--dry-run",
 
@@ -0,0 +1,58 @@
+"""
+Centralized config resolution for all commands.
+
+Single source of truth for resolving strength, temperature, and other config values.
+This module ensures consistent priority ordering across all commands:
+    1. CLI global options (--strength, --temperature) - highest priority
+    2. pddrc context defaults - medium priority
+    3. Hardcoded defaults - lowest priority
+"""
+from typing import Dict, Any, Optional
+import click
+
+from . import DEFAULT_STRENGTH, DEFAULT_TEMPERATURE, DEFAULT_TIME
+
+
+def resolve_effective_config(
+    ctx: click.Context,
+    resolved_config: Dict[str, Any],
+    param_overrides: Optional[Dict[str, Any]] = None
+) -> Dict[str, Any]:
+    """
+    Resolve effective config values with proper priority.
+
+    Priority (highest to lowest):
+        1. Command parameter overrides (e.g., strength kwarg)
+        2. CLI global options (--strength stored in ctx.obj)
+        3. pddrc context defaults (from resolved_config)
+        4. Hardcoded defaults
+
+    Args:
+        ctx: Click context with CLI options in ctx.obj
+        resolved_config: Config returned by construct_paths (contains pddrc values)
+        param_overrides: Optional command-specific parameter overrides
+
+    Returns:
+        Dict with resolved values for strength, temperature, time
+    """
+    ctx_obj = ctx.obj if ctx.obj else {}
+    param_overrides = param_overrides or {}
+
+    def resolve_value(key: str, default: Any) -> Any:
+        # Priority 1: Command parameter override
+        if key in param_overrides and param_overrides[key] is not None:
+            return param_overrides[key]
+        # Priority 2: CLI global option (only if key IS in ctx.obj - meaning CLI passed it)
+        if key in ctx_obj:
+            return ctx_obj[key]
+        # Priority 3: pddrc context default
+        if key in resolved_config and resolved_config[key] is not None:
+            return resolved_config[key]
+        # Priority 4: Hardcoded default
+        return default
+
+    return {
+        "strength": resolve_value("strength", DEFAULT_STRENGTH),
+        "temperature": resolve_value("temperature", DEFAULT_TEMPERATURE),
+        "time": resolve_value("time", DEFAULT_TIME),
+    }
Original file line number	Diff line number	Diff line change
`@@ -37,8 +37,8 @@`
`37`	`37`	`)`
`38`	`38`	`@click.option(`
`39`	`39`	`"--target-coverage",`
`40`		`- default=0.0,`
`41`		`- help="Desired code coverage percentage.",`
	`40`	`+ default=None,`
	`41`	`+ help="Desired code coverage percentage. Default: 10.0 or .pddrc value.",`
`42`	`42`	`)`
`43`	`43`	`@click.option(`
`44`	`44`	`"--dry-run",`