promptdriven · Drowser2430 · Feb 23, 2026 · Feb 23, 2026 · Copilot · Feb 24, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -1,77 +1,123 @@
-# Examples
+# Customer Churn Prediction — PDD Example
 
-This directory contains examples that demonstrate comparisons between using Cursor and Prompt-Driven Development (PDD) for various programming tasks. These examples serve as practical illustrations of how PDD can be used to generate and modify code, via the pdd sync command, and how it compares to traditional development approaches.
+This example demonstrates a complete **Prompt-Driven Development** workflow for a real-world machine learning use case: **predicting customer churn** using logistic regression.
 
-## Getting Started
+It is a companion to the core `hello` and `factorial_calculator` examples, showing PDD applied to a **data science / ML context** — a domain not previously covered in the official examples.
 
-### Post-Installation Setup (Required first step after installation)
+---
 
-Before running any examples, make sure you've completed the PDD setup:
+## What This Example Covers
+
+| PDD Concept | Implementation |
+|---|---|
+| Prompt as source of truth | `prompts/customer_churn_python.prompt` |
+| Code generated from prompt | `customer_churn.py` |
+| Usage example | `example_customer_churn.py` |
+| Unit test suite | `test_customer_churn.py` |
+
+---
+
+## Files
+
+```
+examples/customer_churn/
+├── prompts/
+│   └── customer_churn_python.prompt   # PDD prompt (source of truth)
+├── customer_churn.py                  # Generated module
+├── example_customer_churn.py          # Runnable demo
+├── test_customer_churn.py             # Unit tests (pytest)
+└── README.md                          # This file
+```
+
+---
+
+## Prerequisites
+
+```bash
+pip install pandas numpy scikit-learn pytest
+```
+
+---
+
+## Run the Example
+
+```bash
+cd examples/customer_churn
+python example_customer_churn.py
+```
+
+**Expected output:**
+```
+=======================================================
+  PDD Example: Customer Churn Prediction
+=======================================================
+
+📦 Generating synthetic customer dataset (100 rows)...
+   Dataset shape : (100, 8)
+   Churn rate    : 27.0%
+
+🔧 Training logistic regression pipeline...
+
+📊 Model Evaluation (held-out 20% test set):
+   Accuracy  : 75.00%
+   Precision : 60.00%
+   Recall    : 50.00%
+   F1 Score  : 54.55%
+
+🔍 Top 5 Feature Importances (by abs coefficient):
+   num_support_tickets              +0.5231  ↑ increases churn
+   tenure                           -0.4812  ↓ decreases churn
+   contract_type_Month-to-month     +0.3901  ↑ increases churn
+   ...
+
+🎯 Individual Customer Predictions:
+
+   High-risk customer (month-to-month, 2 months tenure):
+   → Churn probability: 68.42% 🔴
+
+   Low-risk customer (2-year contract, 58 months tenure):
+   → Churn probability: 18.75% 🟢
+```
+
+---
+
+## Run the Tests
+
+```bash
+cd examples/customer_churn
+pytest test_customer_churn.py -v
+```
+
+---
+
+## Run with PDD
+
+To regenerate the code from the prompt using PDD:
 
 ```bash
-pdd setup
+# From the repo root
+pdd --force sync customer_churn
 ```
 
-This command will guide you through:
-- Installing shell tab completion
-- Capturing your API keys
-- Creating ~/.pdd configuration files
-- Writing the starter prompt
+To implement improvements from a GitHub issue:
 
-After setup, reload your shell:
 ```bash
-source ~/.zshrc  # or source ~/.bashrc / fish equivalent
+pdd change https://github.com/promptdriven/pdd/issues/<issue-number>
 ```
 
-## Available Examples
-
-### Agentic Fallback
-The agentic fallback example demonstrates using agentic fallback to resolve cross-file dependencies during automated debugging.  
-The example has two files — `src/main.py` and `src/utils.py` — where `main.py` fails without reading `utils.py`.  
-With agentic fallback enabled, the CLI agent (Claude/Gemini/Codex) can read `utils.py`, understand the dependency, and fix `main.py`.
-Users may intentionally introduce errors in `src/utils.py` to test the agentic fix functionality.
-
-Additional examples demonstrating the use of agentic fallback are provided for Java, TypeScript, and JavaScript.
-
-### Edit File Tool
-The edit_file_tool_example walks through generating a complete Python tool using PDD's streamlined `pdd --force sync` workflow. This example shows:
-- How to drive end-to-end project generation (code, tests, docs) from component prompts (complete dev units)
-- Using the provided Makefile targets to orchestrate setup, prompt creation, and sync runs
-- Integrating automation features like command logging and optional cost tracking during sync
-
-### Handpaint
-The handpaint example demonstrates how PDD can be used to create and modify a painting application. This example shows:
-- How PDD can be used to generate code for a graphical application
-- The process of iteratively refining code through PDD
-- A comparison between traditional development and PDD-assisted development
-
-### Hello World
-The hello_world example demonstrates how PDD can be used to generate code for a simple Python function that prints "hello". This example shows:
-- How PDD can be used to generate code for a simple Python function via the sync command
-
-### Hello You
-The hello_you example expands on the Hello World flow by rendering a personalized greeting in large ASCII art. This example shows:
-- Capturing the current shell username (via `whoami`) and feeding it into the generated program
-- Building a reusable ASCII art alphabet map inside the generated Python file to spell arbitrary strings
-- Producing a self-contained script that prints a 10-row tall "Hello <username>" banner with no external dependencies
-
-### Pi Calc
-The pi_calc example demonstrates how PDD can be used to generate code for a simple Python function that calculates the value of Pi. This example shows:
-- How PDD can be used to generate code for a simple Python function using the sync command
-
-### QR Code Sandwich
-The qrcode_sandwich example demonstrates how PDD can be used to generate code that produces scannable QR codes embedded within photorealistic images using ControlNet QR conditioning. This example shows:
-- Creating a QR code that blends into a realistic image while remaining scannable
-- Leveraging ControlNet QR conditioning in a generated Python script
-- Iterating with PDD to refine parameters and results
-
-More examples will be added to this directory as they are developed.
-
-## Purpose
-These examples are designed to help developers understand:
-1. The capabilities of PDD in different programming contexts
-2. How PDD compares to traditional development workflows
-3. Best practices for using PDD effectively
-4. Real-world applications of PDD in various domains
-
-Each example includes documentation and code that can be used as a reference for your own PDD-based development projects.
+---
+
+## About This Example
+
+This example was contributed as part of a pull request to expand PDD's example library into the **machine learning / data science** domain. The prompt covers:
+
+- **sklearn Pipeline** with `ColumnTransformer` for mixed-type preprocessing
+- **Logistic Regression** binary classification
+- **Evaluation metrics**: accuracy, precision, recall, F1
+- **Edge case handling**: missing values, empty inputs, None model
+
+It demonstrates that PDD's prompt-first approach scales naturally to ML workflows, where prompt clarity directly impacts the quality and reproducibility of generated model code.
+
+---
+
+*Contributed by [Darius Rowser](https://github.com/Drowser2430)*
diff --git a/examples/customer_churn.py b/examples/customer_churn.py
@@ -0,0 +1,165 @@
+"""
+Customer Churn Prediction Module
+Generated via PDD (Prompt-Driven Development) workflow.
+Prompt: prompts/customer_churn_python.prompt
-Prompt: prompts/customer_churn_python.prompt
+Prompt: examples/customer_churn_python.prompt
-Prompt: prompts/customer_churn_python.prompt
+Prompt: examples/customer_churn_python.prompt
+"""
+
+import pandas as pd
+import numpy as np
+from sklearn.pipeline import Pipeline
+from sklearn.compose import ColumnTransformer
+from sklearn.preprocessing import StandardScaler, OneHotEncoder
+from sklearn.impute import SimpleImputer
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
+from typing import Optional
+
+
+REQUIRED_COLUMNS = [
+    "tenure", "monthly_charges", "total_charges",
+    "contract_type", "payment_method",
+    "num_support_tickets", "has_tech_support", "churn"
+]
+
+NUMERIC_FEATURES = ["tenure", "monthly_charges", "total_charges", "num_support_tickets"]
+CATEGORICAL_FEATURES = ["contract_type", "payment_method"]
+BOOL_FEATURES = ["has_tech_support"]
+
+
+def _validate_dataframe(df: pd.DataFrame, require_churn: bool = True) -> None:
+    """Validate that the DataFrame has the required columns and sufficient rows."""
+    required = REQUIRED_COLUMNS if require_churn else [c for c in REQUIRED_COLUMNS if c != "churn"]
+    missing = [col for col in required if col not in df.columns]
+    if missing:
+        raise ValueError(f"DataFrame is missing required columns: {missing}")
+    if len(df) < 10:
+        raise ValueError(f"DataFrame must have at least 10 rows, got {len(df)}")
+
+
+def _build_pipeline() -> Pipeline:
+    """Build the sklearn preprocessing and model pipeline."""
+    numeric_transformer = Pipeline(steps=[
+        ("imputer", SimpleImputer(strategy="median")),
+        ("scaler", StandardScaler())
+    ])
+
+    categorical_transformer = Pipeline(steps=[
+        ("imputer", SimpleImputer(strategy="most_frequent")),
+        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
-        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
+        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))
-        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
+        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))
+    ])
+
+    bool_transformer = Pipeline(steps=[
+        ("imputer", SimpleImputer(strategy="most_frequent")),
+        ("scaler", StandardScaler())
+    ])
+
+    preprocessor = ColumnTransformer(transformers=[
+        ("num", numeric_transformer, NUMERIC_FEATURES),
+        ("cat", categorical_transformer, CATEGORICAL_FEATURES),
+        ("bool", bool_transformer, BOOL_FEATURES)
+    ])
+
+    pipeline = Pipeline(steps=[
+        ("preprocessor", preprocessor),
+        ("classifier", LogisticRegression(max_iter=1000, random_state=42))
+    ])
+
+    return pipeline
+
+
+def train(df: pd.DataFrame) -> dict:
+    """
+    Train a customer churn prediction model on the provided DataFrame.
+
+    Args:
+        df: A pandas DataFrame containing customer features and 'churn' label.
+            Required columns: tenure, monthly_charges, total_charges,
+            contract_type, payment_method, num_support_tickets,
+            has_tech_support, churn.
+
+    Returns:
+        A dict with keys:
+            - "model": fitted sklearn Pipeline
+            - "accuracy": float
+            - "precision": float
+            - "recall": float
+            - "f1": float
+            - "feature_importances": dict mapping feature names to coefficients
+
+    Raises:
+        ValueError: If required columns are missing or DataFrame has fewer than 10 rows.
+    """
+    _validate_dataframe(df, require_churn=True)
+
+    df = df.copy()
+    df["has_tech_support"] = df["has_tech_support"].astype(float)
+
+    X = df[NUMERIC_FEATURES + CATEGORICAL_FEATURES + BOOL_FEATURES]
+    y = df["churn"].astype(int)
+
+    X_train, X_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42
+    )
+
+    pipeline = _build_pipeline()
+    pipeline.fit(X_train, y_train)
+
+    y_pred = pipeline.predict(X_test)
+
+    accuracy = accuracy_score(y_test, y_pred)
+    precision = precision_score(y_test, y_pred, zero_division=0)
+    recall = recall_score(y_test, y_pred, zero_division=0)
+    f1 = f1_score(y_test, y_pred, zero_division=0)
+
+    # Extract feature importances (logistic regression coefficients)
+    ohe_features = list(
+        pipeline.named_steps["preprocessor"]
+        .named_transformers_["cat"]
+        .named_steps["onehot"]
+        .get_feature_names_out(CATEGORICAL_FEATURES)
+    )
+    all_feature_names = NUMERIC_FEATURES + ohe_features + BOOL_FEATURES
+    coefficients = pipeline.named_steps["classifier"].coef_[0]
+    feature_importances = dict(zip(all_feature_names, coefficients.tolist()))
+
+    return {
+        "model": pipeline,
+        "accuracy": round(accuracy, 4),
+        "precision": round(precision, 4),
+        "recall": round(recall, 4),
+        "f1": round(f1, 4),
+        "feature_importances": feature_importances
+    }
+
+
+def predict(model_pipeline: Optional[Pipeline], customer: dict) -> float:
+    """
+    Predict churn probability for a single customer.
+
+    Args:
+        model_pipeline: A fitted sklearn Pipeline returned by train().
+                        Returns 0.0 if None.
+        customer: A dict with customer feature values. Required keys:
+                  tenure, monthly_charges, total_charges, contract_type,
+                  payment_method, num_support_tickets, has_tech_support.
+
+    Returns:
+        Churn probability as a float between 0.0 and 1.0.
+    """
+    if model_pipeline is None:
+        return 0.0
+
+    customer_copy = dict(customer)
+    customer_copy["has_tech_support"] = float(customer_copy.get("has_tech_support", False))
+
+    input_df = pd.DataFrame([customer_copy])
+    feature_cols = NUMERIC_FEATURES + CATEGORICAL_FEATURES + BOOL_FEATURES
+
+    for col in feature_cols:
+        if col not in input_df.columns:
+            input_df[col] = np.nan
+
+    input_df = input_df[feature_cols]
+    proba = model_pipeline.predict_proba(input_df)[0][1]
+    return round(float(proba), 4)