Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 111 additions & 65 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,123 @@
# Examples
# Customer Churn Prediction — PDD Example

This directory contains examples that demonstrate comparisons between using Cursor and Prompt-Driven Development (PDD) for various programming tasks. These examples serve as practical illustrations of how PDD can be used to generate and modify code, via the pdd sync command, and how it compares to traditional development approaches.
This example demonstrates a complete **Prompt-Driven Development** workflow for a real-world machine learning use case: **predicting customer churn** using logistic regression.

## Getting Started
It is a companion to the core `hello` and `factorial_calculator` examples, showing PDD applied to a **data science / ML context** — a domain not previously covered in the official examples.
Comment on lines +1 to +5
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/README.md has been replaced with churn-specific documentation, which removes the overview/index for all other example projects under examples/. Please restore the examples index README and move the churn docs into a dedicated examples/customer_churn/README.md (then link to it from the main examples README).

Copilot uses AI. Check for mistakes.

### Post-Installation Setup (Required first step after installation)
---

Before running any examples, make sure you've completed the PDD setup:
## What This Example Covers

| PDD Concept | Implementation |
|---|---|
| Prompt as source of truth | `prompts/customer_churn_python.prompt` |
| Code generated from prompt | `customer_churn.py` |
| Usage example | `example_customer_churn.py` |
Comment on lines +11 to +15
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README points to prompts/customer_churn_python.prompt, but the prompt file added by this PR is examples/customer_churn_python.prompt (no examples/prompts/ directory). Please fix the documented path (or move the prompt file) so the README reflects the actual layout.

Copilot uses AI. Check for mistakes.
| Unit test suite | `test_customer_churn.py` |

---

## Files

```
examples/customer_churn/
├── prompts/
│ └── customer_churn_python.prompt # PDD prompt (source of truth)
├── customer_churn.py # Generated module
├── example_customer_churn.py # Runnable demo
Comment on lines +23 to +27
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented file tree assumes an examples/customer_churn/ folder, but this PR currently adds the churn files directly under examples/. Please either move the files into the documented directory structure or update the tree and commands accordingly.

Copilot uses AI. Check for mistakes.
├── test_customer_churn.py # Unit tests (pytest)
└── README.md # This file
```

---

## Prerequisites

```bash
pip install pandas numpy scikit-learn pytest
```

---

## Run the Example

```bash
cd examples/customer_churn
python example_customer_churn.py
```

**Expected output:**
```
=======================================================
PDD Example: Customer Churn Prediction
=======================================================

📦 Generating synthetic customer dataset (100 rows)...
Dataset shape : (100, 8)
Churn rate : 27.0%

🔧 Training logistic regression pipeline...

📊 Model Evaluation (held-out 20% test set):
Accuracy : 75.00%
Precision : 60.00%
Recall : 50.00%
F1 Score : 54.55%

🔍 Top 5 Feature Importances (by abs coefficient):
num_support_tickets +0.5231 ↑ increases churn
tenure -0.4812 ↓ decreases churn
contract_type_Month-to-month +0.3901 ↑ increases churn
...

🎯 Individual Customer Predictions:

High-risk customer (month-to-month, 2 months tenure):
→ Churn probability: 68.42% 🔴

Low-risk customer (2-year contract, 58 months tenure):
→ Churn probability: 18.75% 🟢
```

---

## Run the Tests

```bash
cd examples/customer_churn
pytest test_customer_churn.py -v
```

---

## Run with PDD

To regenerate the code from the prompt using PDD:

```bash
pdd setup
# From the repo root
pdd --force sync customer_churn
```

This command will guide you through:
- Installing shell tab completion
- Capturing your API keys
- Creating ~/.pdd configuration files
- Writing the starter prompt
To implement improvements from a GitHub issue:

After setup, reload your shell:
```bash
source ~/.zshrc # or source ~/.bashrc / fish equivalent
pdd change https://github.com/promptdriven/pdd/issues/<issue-number>
```

## Available Examples

### Agentic Fallback
The agentic fallback example demonstrates using agentic fallback to resolve cross-file dependencies during automated debugging.
The example has two files — `src/main.py` and `src/utils.py` — where `main.py` fails without reading `utils.py`.
With agentic fallback enabled, the CLI agent (Claude/Gemini/Codex) can read `utils.py`, understand the dependency, and fix `main.py`.
Users may intentionally introduce errors in `src/utils.py` to test the agentic fix functionality.

Additional examples demonstrating the use of agentic fallback are provided for Java, TypeScript, and JavaScript.

### Edit File Tool
The edit_file_tool_example walks through generating a complete Python tool using PDD's streamlined `pdd --force sync` workflow. This example shows:
- How to drive end-to-end project generation (code, tests, docs) from component prompts (complete dev units)
- Using the provided Makefile targets to orchestrate setup, prompt creation, and sync runs
- Integrating automation features like command logging and optional cost tracking during sync

### Handpaint
The handpaint example demonstrates how PDD can be used to create and modify a painting application. This example shows:
- How PDD can be used to generate code for a graphical application
- The process of iteratively refining code through PDD
- A comparison between traditional development and PDD-assisted development

### Hello World
The hello_world example demonstrates how PDD can be used to generate code for a simple Python function that prints "hello". This example shows:
- How PDD can be used to generate code for a simple Python function via the sync command

### Hello You
The hello_you example expands on the Hello World flow by rendering a personalized greeting in large ASCII art. This example shows:
- Capturing the current shell username (via `whoami`) and feeding it into the generated program
- Building a reusable ASCII art alphabet map inside the generated Python file to spell arbitrary strings
- Producing a self-contained script that prints a 10-row tall "Hello <username>" banner with no external dependencies

### Pi Calc
The pi_calc example demonstrates how PDD can be used to generate code for a simple Python function that calculates the value of Pi. This example shows:
- How PDD can be used to generate code for a simple Python function using the sync command

### QR Code Sandwich
The qrcode_sandwich example demonstrates how PDD can be used to generate code that produces scannable QR codes embedded within photorealistic images using ControlNet QR conditioning. This example shows:
- Creating a QR code that blends into a realistic image while remaining scannable
- Leveraging ControlNet QR conditioning in a generated Python script
- Iterating with PDD to refine parameters and results

More examples will be added to this directory as they are developed.

## Purpose
These examples are designed to help developers understand:
1. The capabilities of PDD in different programming contexts
2. How PDD compares to traditional development workflows
3. Best practices for using PDD effectively
4. Real-world applications of PDD in various domains

Each example includes documentation and code that can be used as a reference for your own PDD-based development projects.
---

## About This Example

This example was contributed as part of a pull request to expand PDD's example library into the **machine learning / data science** domain. The prompt covers:

- **sklearn Pipeline** with `ColumnTransformer` for mixed-type preprocessing
- **Logistic Regression** binary classification
- **Evaluation metrics**: accuracy, precision, recall, F1
- **Edge case handling**: missing values, empty inputs, None model

It demonstrates that PDD's prompt-first approach scales naturally to ML workflows, where prompt clarity directly impacts the quality and reproducibility of generated model code.

---

*Contributed by [Darius Rowser](https://github.com/Drowser2430)*
165 changes: 165 additions & 0 deletions examples/customer_churn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
"""
Customer Churn Prediction Module
Generated via PDD (Prompt-Driven Development) workflow.
Prompt: prompts/customer_churn_python.prompt
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says Prompt: prompts/customer_churn_python.prompt, but the prompt file added in this PR is examples/customer_churn_python.prompt (and there is no examples/prompts/ folder). Update the reference so the source-of-truth prompt path is correct after the final directory layout is decided.

Suggested change
Prompt: prompts/customer_churn_python.prompt
Prompt: examples/customer_churn_python.prompt

Copilot uses AI. Check for mistakes.
"""

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from typing import Optional


REQUIRED_COLUMNS = [
"tenure", "monthly_charges", "total_charges",
"contract_type", "payment_method",
"num_support_tickets", "has_tech_support", "churn"
]

NUMERIC_FEATURES = ["tenure", "monthly_charges", "total_charges", "num_support_tickets"]
CATEGORICAL_FEATURES = ["contract_type", "payment_method"]
BOOL_FEATURES = ["has_tech_support"]


def _validate_dataframe(df: pd.DataFrame, require_churn: bool = True) -> None:
"""Validate that the DataFrame has the required columns and sufficient rows."""
required = REQUIRED_COLUMNS if require_churn else [c for c in REQUIRED_COLUMNS if c != "churn"]
missing = [col for col in required if col not in df.columns]
if missing:
raise ValueError(f"DataFrame is missing required columns: {missing}")
if len(df) < 10:
raise ValueError(f"DataFrame must have at least 10 rows, got {len(df)}")


def _build_pipeline() -> Pipeline:
"""Build the sklearn preprocessing and model pipeline."""
numeric_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler())
])

categorical_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="most_frequent")),
("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneHotEncoder(..., sparse_output=False) requires scikit-learn >= 1.2; the README currently installs scikit-learn without a minimum version. Either document the minimum required scikit-learn version for this example or use an encoder argument compatible with older versions to avoid runtime failures for users.

Suggested change
("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))

Copilot uses AI. Check for mistakes.
])

bool_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="most_frequent")),
("scaler", StandardScaler())
])

preprocessor = ColumnTransformer(transformers=[
("num", numeric_transformer, NUMERIC_FEATURES),
("cat", categorical_transformer, CATEGORICAL_FEATURES),
("bool", bool_transformer, BOOL_FEATURES)
])

pipeline = Pipeline(steps=[
("preprocessor", preprocessor),
("classifier", LogisticRegression(max_iter=1000, random_state=42))
])

return pipeline


def train(df: pd.DataFrame) -> dict:
"""
Train a customer churn prediction model on the provided DataFrame.

Args:
df: A pandas DataFrame containing customer features and 'churn' label.
Required columns: tenure, monthly_charges, total_charges,
contract_type, payment_method, num_support_tickets,
has_tech_support, churn.

Returns:
A dict with keys:
- "model": fitted sklearn Pipeline
- "accuracy": float
- "precision": float
- "recall": float
- "f1": float
- "feature_importances": dict mapping feature names to coefficients

Raises:
ValueError: If required columns are missing or DataFrame has fewer than 10 rows.
"""
_validate_dataframe(df, require_churn=True)

df = df.copy()
df["has_tech_support"] = df["has_tech_support"].astype(float)

X = df[NUMERIC_FEATURES + CATEGORICAL_FEATURES + BOOL_FEATURES]
y = df["churn"].astype(int)

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

pipeline = _build_pipeline()
pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, zero_division=0)
recall = recall_score(y_test, y_pred, zero_division=0)
f1 = f1_score(y_test, y_pred, zero_division=0)

# Extract feature importances (logistic regression coefficients)
ohe_features = list(
pipeline.named_steps["preprocessor"]
.named_transformers_["cat"]
.named_steps["onehot"]
.get_feature_names_out(CATEGORICAL_FEATURES)
)
all_feature_names = NUMERIC_FEATURES + ohe_features + BOOL_FEATURES
coefficients = pipeline.named_steps["classifier"].coef_[0]
feature_importances = dict(zip(all_feature_names, coefficients.tolist()))

return {
"model": pipeline,
"accuracy": round(accuracy, 4),
"precision": round(precision, 4),
"recall": round(recall, 4),
"f1": round(f1, 4),
"feature_importances": feature_importances
}


def predict(model_pipeline: Optional[Pipeline], customer: dict) -> float:
"""
Predict churn probability for a single customer.

Args:
model_pipeline: A fitted sklearn Pipeline returned by train().
Returns 0.0 if None.
customer: A dict with customer feature values. Required keys:
tenure, monthly_charges, total_charges, contract_type,
payment_method, num_support_tickets, has_tech_support.

Returns:
Churn probability as a float between 0.0 and 1.0.
"""
if model_pipeline is None:
return 0.0

customer_copy = dict(customer)
customer_copy["has_tech_support"] = float(customer_copy.get("has_tech_support", False))

input_df = pd.DataFrame([customer_copy])
feature_cols = NUMERIC_FEATURES + CATEGORICAL_FEATURES + BOOL_FEATURES

for col in feature_cols:
if col not in input_df.columns:
input_df[col] = np.nan

input_df = input_df[feature_cols]
proba = model_pipeline.predict_proba(input_df)[0][1]
return round(float(proba), 4)
Loading
Loading