Contributing to ACE Playbook

Thank you for your interest in contributing to ACE Playbook! This document provides guidelines and instructions for contributing.

Getting Started
Development Setup
Pre-commit Hooks
Code Standards
Commit Message Convention
Testing
Pull Request Process

Getting Started

Fork the repository on GitHub

Clone your fork locally:

git clone https://github.com/YOUR_USERNAME/ace-playbook.git
cd ace-playbook

Add the upstream repository:

git remote add upstream https://github.com/jmanhype/ace-playbook.git

Development Setup

Prerequisites

Python 3.11 or higher
uv package manager (recommended) or pip
Git

Installation

# Install dependencies with development tools
uv pip install -e ".[dev]"

# Or with pip
pip install -e ".[dev]"

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Initialize database
alembic upgrade head

# Install pre-commit hooks (REQUIRED)
pre-commit install
pre-commit install --hook-type commit-msg

Verify Setup

# Run tests to verify installation
pytest tests/unit/ -v

# Run pre-commit on all files
pre-commit run --all-files

Pre-commit Hooks

Pre-commit hooks ensure code quality and consistency. They run automatically before each commit.

Hook Categories

Code Quality

Black: Automatic code formatting (line length: 100)
Ruff: Fast Python linting (replaces flake8)
isort: Import statement sorting

Type Safety

mypy: Static type checking with strict mode

Security

Bandit: Security vulnerability scanning
detect-secrets: Prevents committing secrets (API keys, tokens)

Documentation

interrogate: Docstring coverage checking (≥80% required)
markdownlint: Markdown file linting

Standards

Conventional Commits: Enforces commit message format
trailing-whitespace: Removes trailing whitespace
end-of-file-fixer: Ensures files end with newline
check-yaml/toml/json: Configuration file validation

Infrastructure

hadolint: Dockerfile linting
sqlfluff: SQL migration linting
yamllint: YAML file linting

Running Hooks

# Automatic: Hooks run on every commit
git commit -m "feat: add new feature"

# Manual: Run on all files
pre-commit run --all-files

# Manual: Run specific hook
pre-commit run black --all-files
pre-commit run mypy --all-files

# Skip hooks (use sparingly, only for WIP commits)
git commit --no-verify -m "WIP: work in progress"

Handling Hook Failures

If a hook fails:

Auto-fixable issues (Black, Ruff, isort): Files are auto-fixed. Review changes and re-commit.
Manual fixes required (mypy, bandit): Fix the reported issues and re-commit.
Commit message issues: Rewrite commit message following conventions.

Example workflow:

git add myfile.py
git commit -m "feat: add new feature"
# Black reformats the file
# Review changes
git add myfile.py
git commit -m "feat: add new feature"
# Success!

Code Standards

Python Style

Line length: 100 characters (enforced by Black)
Type hints: Required for all public functions and methods
Docstrings: Required for all public modules, classes, and functions (Google style)
Import order: Standard library → Third-party → Local (enforced by isort)

Type Hints Example

from typing import List, Optional
from ace.models.playbook import PlaybookBullet

def get_bullets(
    domain_id: str,
    section: Optional[str] = None,
    limit: int = 10
) -> List[PlaybookBullet]:
    """Retrieve playbook bullets for a domain.

    Args:
        domain_id: Domain identifier (namespace)
        section: Optional section filter (Helpful/Harmful/Neutral)
        limit: Maximum number of bullets to return

    Returns:
        List of PlaybookBullet objects matching criteria

    Raises:
        ValueError: If domain_id is invalid
    """
    # Implementation

Docstring Coverage

All public APIs require docstrings. Minimum coverage: 80%

# Check docstring coverage
interrogate -vv ace/

# Generate coverage report
interrogate -vv --generate-badge docs/ ace/

Security Guidelines

Never commit secrets: Use environment variables
Input validation: Validate all user inputs
SQL injection: Use parameterized queries (SQLAlchemy ORM)
Path traversal: Validate file paths

Commit Message Convention

We follow Conventional Commits specification.

Format

<type>(<scope>): <subject>

<body>

<footer>

Types

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting, no logic change)
refactor: Code refactoring (no feature or bug fix)
perf: Performance improvements
test: Adding or updating tests
chore: Maintenance tasks (dependencies, tooling)
ci: CI/CD changes

Examples

# Simple feature
git commit -m "feat: add semantic deduplication threshold config"

# Bug fix with scope
git commit -m "fix(curator): handle empty embedding vectors"

# Breaking change
git commit -m "feat!: replace FAISS with custom similarity search

BREAKING CHANGE: FaissIndexManager API changed, embeddings now normalized"

# Multiple paragraphs
git commit -m "refactor(reflector): extract insight classification logic

Split InsightClassifier into separate module for better testability.
Updated tests to use new interface.

Closes #42"

Validation

The conventional-pre-commit hook validates commit messages:

# ✅ Valid
git commit -m "feat: add metrics collector"

# ❌ Invalid - missing type
git commit -m "add metrics collector"

# ❌ Invalid - uppercase subject
git commit -m "feat: Add metrics collector"

# ❌ Invalid - period at end
git commit -m "feat: add metrics collector."

Testing

Test Structure

tests/
├── unit/           # Fast, isolated unit tests
├── integration/    # Multi-component integration tests
└── e2e/           # End-to-end smoke tests

Running Tests

# All tests
pytest tests/ -v

# Specific category
pytest tests/unit/ -v
pytest tests/integration/ -v
pytest tests/e2e/ -v

# With coverage
pytest tests/ --cov=ace --cov-report=html

# Specific test file
pytest tests/unit/test_semantic_curator.py -v

# Specific test
pytest tests/unit/test_semantic_curator.py::TestSemanticCurator::test_deduplication -v

Writing Tests

import pytest
from ace.curator import SemanticCurator

class TestSemanticCurator:
    """Tests for SemanticCurator."""

    @pytest.fixture
    def curator(self) -> SemanticCurator:
        """Create curator instance."""
        return SemanticCurator()

    def test_deduplication(self, curator: SemanticCurator) -> None:
        """Test semantic deduplication at 0.8 threshold."""
        # Arrange
        insights = [...]

        # Act
        result = curator.batch_merge(...)

        # Assert
        assert len(result["updated_playbook"]) == 1

Test Requirements

Coverage: Maintain ≥80% code coverage
Isolation: Tests should not depend on external services (use mocks)
Speed: Unit tests should run in <1s each
Clarity: Use descriptive test names and clear arrange-act-assert structure

Mutation Testing

Mutation testing verifies test suite quality by introducing small code changes (mutations) and checking if tests catch them. A high mutation kill rate (≥90%) indicates a robust test suite.

What is Mutation Testing?

Mutation testing works by:

Creating "mutants" - small modifications to source code (e.g., changing > to >=, + to -)
Running your test suite against each mutant
Checking if tests fail (killing the mutant) or pass (mutant survives)

A surviving mutant indicates:

Missing test coverage
Weak assertions
Logic that doesn't affect behavior

Running Mutation Tests

# Using Make (recommended)
make mutation-test          # Test curator module
make mutation-results       # Show results summary
make mutation-show ID=1     # Show specific mutation

# Using script directly
./scripts/run_mutation_tests.sh

# Manual mutmut usage
mutmut run                  # Run on all code
mutmut results              # Show summary
mutmut show 5               # View mutation #5
mutmut html                 # Generate HTML report

Interpreting Results

Mutation Score Guide:

100% killed: Excellent - all mutations detected
90-99% killed: Good - minor gaps acceptable
80-89% killed: Acceptable - needs improvement
<80% killed: Poor - significant test gaps

Example output:

Survived: 5 mutants
Killed: 45 mutants
Mutation score: 90%

Fixing Surviving Mutants

If a mutant survives:

View the mutation:
```
mutmut show 5
```
Analyze why it survived:
- Missing test case?
- Weak assertion (e.g., checking type but not value)?
- Dead code that should be removed?

Add/improve tests:

# Before: Weak assertion
def test_calculation():
    result = calculate(5, 3)
    assert isinstance(result, int)  # Mutant survives

# After: Strong assertion
def test_calculation():
    result = calculate(5, 3)
    assert result == 8  # Mutant killed

Re-run mutation test to verify the fix

Common Mutation Types

mutmut introduces various mutations:

Arithmetic: + → -, * → /, // → %
Comparison: > → >=, == → !=, < → <=
Boolean: and → or, True → False
Numbers: 0 → 1, 1 → 0, n → n+1
Strings: "text" → "XXtextXX"

Best Practices

Run mutation tests periodically (not on every commit - they're slow)
Focus on critical modules first (curator, reflector, generator)
Aim for 90%+ mutation score on core business logic
Use mutation testing to find test gaps, not just coverage holes
Document surviving mutants if they're intentional (e.g., logging code)

CI Integration

Mutation testing is resource-intensive. Consider:

# Run on specific modules only
mutmut run ace/curator/semantic_curator.py

# Run in CI on PRs to core modules
if [ "$CHANGED_MODULE" = "ace/curator" ]; then
  make mutation-test
fi

Configuration

Mutation testing is configured in pyproject.toml:

[tool.mutmut]
# Configuration handled by .mutmut-config
# See scripts/run_mutation_tests.sh for usage

Property-Based Testing

Property-based testing automatically generates test inputs to verify invariants that should hold for ALL inputs, not just hand-picked examples. Uses the hypothesis library to find edge cases that break your code.

What is Property-Based Testing?

Instead of writing specific test cases:

# Traditional example-based test
def test_addition():
    assert add(2, 3) == 5
    assert add(0, 0) == 0
    assert add(-1, 1) == 0

Write properties that should always be true:

# Property-based test
@given(x=st.integers(), y=st.integers())
def test_addition_commutative(x, y):
    assert add(x, y) == add(y, x)  # Tests MILLIONS of inputs

Why Use Property-Based Testing?

Finds edge cases humans miss (negative numbers, zero, MAX_INT)
Acts as executable specification (documents what code MUST do)
Shrinks failures to minimal reproducing example
Tests more code paths with less test code

Running Property Tests

# Run property tests
pytest tests/unit/test_curator_properties.py -v

# Run with more examples (slower but more thorough)
pytest tests/unit/test_curator_properties.py -v --hypothesis-max-examples=100

# Show generated examples
pytest tests/unit/test_curator_properties.py -v --hypothesis-show-most-frequent=10

Writing Property Tests

Located in tests/unit/test_curator_properties.py with examples:

Property: Idempotence

@given(domain_id=domain_ids, section=insight_sections)
def test_deduplication_idempotent(curator, domain_id, section):
    """
    Applying the same insight twice should only add one bullet.
    curator(curator(playbook, insight)) = curator(playbook, insight)
    """
    playbook = []
    insight = {"content": "Test", "section": section}

    output1 = curator.apply_delta(playbook, [insight])
    output2 = curator.apply_delta(output1.playbook, [insight])

    assert len(output2.playbook) == 1  # Not 2!

Property: Monotonicity

@given(domain_id=domain_ids)
def test_counters_monotonic(curator, domain_id):
    """
    Counters never decrease during operations.
    For all operations: counter_after ≥ counter_before
    """
    bullet = create_bullet(helpful_count=5, harmful_count=3)

    output = curator.apply_delta([bullet], new_insights)

    for updated in output.playbook:
        assert updated.helpful_count >= 5
        assert updated.harmful_count >= 3

Property: Symmetry

@given(vec1=embeddings_384, vec2=embeddings_384)
def test_similarity_symmetric(vec1, vec2):
    """
    Cosine similarity is symmetric.
    sim(A, B) = sim(B, A)
    """
    assert compute_similarity(vec1, vec2) == compute_similarity(vec2, vec1)

Hypothesis Strategies

Hypothesis provides generators for test data:

from hypothesis import strategies as st

# Built-in strategies
integers = st.integers(min_value=0, max_value=100)
floats = st.floats(min_value=0.0, max_value=1.0, allow_nan=False)
text = st.text(min_size=1, max_size=100, alphabet=st.characters())
lists = st.lists(st.integers(), min_size=0, max_size=10)

# Custom strategies (domain IDs)
domain_ids = st.from_regex(r"^[a-z0-9-]{3,20}$").filter(
    lambda x: x not in {"system", "admin", "test"}
)

# Composite strategies (complex objects)
@st.composite
def playbook_bullet(draw):
    domain_id = draw(domain_ids)
    content = draw(st.text(min_size=10, max_size=200))
    embedding = draw(st.lists(st.floats(), min_size=384, max_size=384))

    return PlaybookBullet(
        domain_id=domain_id,
        content=content,
        embedding=embedding,
        ...
    )

Common Properties to Test

Invariants (always true):

Idempotence: f(f(x)) = f(x)
Commutativity: f(x, y) = f(y, x)
Associativity: f(f(x, y), z) = f(x, f(y, z))
Identity: f(x, identity) = x

Relations (between operations):

Inverse: g(f(x)) = x
Monotonicity: x < y → f(x) < f(y)
Symmetry: f(x, y) = f(y, x)

Bounds:

Range limits: 0 ≤ f(x) ≤ 1
Size constraints: len(result) ≤ len(input)

Debugging Property Test Failures

When hypothesis finds a failure:

Read the minimal example:

Falsifying example: test_counters_monotonic(
    curator=<SemanticCurator>,
    domain_id='a-b-1',
    section1='Helpful',
    section2='Harmful'
)

Reproduce with exact values:

def test_regression_counters_monotonic():
    curator = SemanticCurator()
    domain_id = 'a-b-1'
    # ... use exact failing values

Fix the bug or adjust the property if it's too strict
Add as regression test to prevent future failures

Best Practices

Start with simple properties (symmetry, bounds, commutativity)

Use assume() to filter invalid inputs:

@given(x=st.integers(), y=st.integers())
def test_division(x, y):
    assume(y != 0)  # Skip zero divisor
    assert x / y == x / y

Combine with example-based tests for clarity
Run property tests in CI (they're slower than unit tests)
Document discovered properties as they emerge
Keep max_examples low in development (20-50), high in CI (100-1000)

Configuration

Property testing is configured in pyproject.toml:

[tool.pytest.ini_options]
markers = [
    "property: Property-based tests with hypothesis",
]

Run with custom settings:

# More examples (slower, more thorough)
pytest --hypothesis-max-examples=1000

# Print statistics
pytest --hypothesis-show-statistics

# Deterministic (for CI)
pytest --hypothesis-seed=12345

Complexity Monitoring

Code complexity monitoring uses radon to measure cyclomatic complexity and maintainability index. High complexity indicates code that's hard to understand, test, and maintain.

What is Cyclomatic Complexity?

Cyclomatic complexity measures the number of independent paths through code:

1-5 (A): Simple, easy to test
6-10 (B): Moderate, acceptable
11-20 (C): Complex, consider refactoring
21-50 (D): Very complex, should refactor
51+ (F): Unmaintainable, must refactor

Maintainability Index

Maintainability index combines complexity, lines of code, and Halstead volume:

100-20 (A): Very maintainable
19-10 (B): Maintainable
9-0 (C): Needs attention

Running Complexity Checks

# Check complexity (informational)
make complexity

# Check with strict thresholds (fail on C-grade or worse)
make complexity-strict

# Generate JSON reports
make complexity-report

# Manual radon commands
radon cc ace/ -a                    # Cyclomatic complexity with average
radon cc ace/ -n C -s               # Fail on C-grade or worse
radon mi ace/ -nb                   # Maintainability index (no Berkley)
radon mi ace/ -s                    # Show maintainability index

Pre-commit Integration

Complexity checks run automatically on git push:

# .pre-commit-config.yaml
- id: radon-cc
  name: Check code complexity
  entry: radon cc ace/ -n C -s      # Fail on C-grade or worse
  stages: [push]

- id: radon-mi
  name: Check maintainability index
  entry: radon mi ace/ -n B -s      # Fail on B-grade or worse
  stages: [push]

To skip complexity checks (not recommended):

git push --no-verify

Interpreting Results

Example output:

ace/curator/semantic_curator.py
    M 435:0 SemanticCurator - B (19.2)
    M 81:4 SemanticCurator.apply_delta - B (6)
    M 237:4 SemanticCurator.batch_merge - C (12)

Average complexity: B (8.4)

Reading the output:

M: Method
435:0: Line number
B (19.2): Grade (B) and maintainability index (19.2)
B (6): Grade (B) and cyclomatic complexity (6)

Refactoring High Complexity Code

When you encounter C-grade or worse complexity:

1. Extract Methods:

# Before: Complex method (C-grade)
def process_data(data):
    if condition1:
        if condition2:
            if condition3:
                # ... nested logic
                pass

# After: Extracted methods (A/B-grade)
def process_data(data):
    if should_process(data):
        return perform_processing(data)
    return default_result()

def should_process(data):
    return condition1 and condition2 and condition3

def perform_processing(data):
    # ... logic here
    pass

2. Use Guard Clauses:

# Before: Nested conditions (high complexity)
def validate(data):
    if data:
        if data.is_valid():
            if data.has_required_fields():
                return True
    return False

# After: Guard clauses (lower complexity)
def validate(data):
    if not data:
        return False
    if not data.is_valid():
        return False
    if not data.has_required_fields():
        return False
    return True

3. Replace Conditionals with Polymorphism:

# Before: Multiple isinstance checks (high complexity)
def process(obj):
    if isinstance(obj, TypeA):
        # ... complex logic
    elif isinstance(obj, TypeB):
        # ... complex logic
    elif isinstance(obj, TypeC):
        # ... complex logic

# After: Polymorphism (low complexity)
class BaseType:
    def process(self):
        raise NotImplementedError

class TypeA(BaseType):
    def process(self):
        # ... logic

def process(obj: BaseType):
    return obj.process()

4. Use Data Structures:

# Before: Long if-elif chain (high complexity)
def get_status(code):
    if code == 200:
        return "OK"
    elif code == 404:
        return "Not Found"
    elif code == 500:
        return "Error"
    # ... many more

# After: Dictionary lookup (low complexity)
STATUS_MAP = {
    200: "OK",
    404: "Not Found",
    500: "Error",
    # ...
}

def get_status(code):
    return STATUS_MAP.get(code, "Unknown")

Complexity Targets

For ACE Playbook codebase:

Target: All modules ≤ B-grade average
Maximum: No functions > C-grade (complexity 20)
Pre-commit: Blocks push if C-grade or worse detected

Best Practices

Monitor complexity regularly during development
Refactor proactively before complexity grows
Break down large functions into smaller, focused functions
Use helper functions to reduce nesting
Document complex logic if refactoring isn't possible
Review complexity in PRs before merging

CI Integration

Run complexity checks in CI pipeline:

# In CI script
make complexity-strict || exit 1

Add to PR checks:

Fail if average complexity > B-grade
Warn if any function > 10 complexity
Block if any function > 20 complexity

Pull Request Process

Before Submitting

Update your branch:

git fetch upstream
git rebase upstream/main

Run all checks:

# Pre-commit hooks
pre-commit run --all-files

# Tests with coverage
pytest tests/ --cov=ace --cov-report=term-missing

# Type checking
mypy ace/

Update documentation:
- Update README.md if adding features
- Add docstrings to new functions/classes
- Update CHANGELOG.md following Keep a Changelog format

Submitting PR

Push to your fork:

git push origin feature/your-feature-name

Create Pull Request on GitHub with:
- Clear title following conventional commits format
- Description of changes
- Link to related issues (e.g., "Closes #42")
- Screenshots/examples if applicable
PR Checklist:
- All pre-commit hooks pass
- All tests pass
- Code coverage ≥80%
- Documentation updated
- CHANGELOG.md updated
- Conventional commit format
- No merge conflicts

PR Review Process

Maintainers will review your PR within 1-2 business days
Address review feedback with new commits
Once approved, maintainers will merge using squash-and-merge

After Merge

Delete your branch:

git checkout main
git branch -D feature/your-feature-name
git push origin --delete feature/your-feature-name

Update your fork:

git fetch upstream
git merge upstream/main
git push origin main

Questions?

Issues: Open an issue on GitHub for bugs or feature requests
Discussions: Use GitHub Discussions for questions
Security: Email security@example.com for security vulnerabilities

Thank you for contributing to ACE Playbook! 🚀

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to ACE Playbook

Table of Contents

Getting Started

Development Setup

Prerequisites

Installation

Verify Setup

Pre-commit Hooks

Hook Categories

Code Quality

Type Safety

Security

Documentation

Standards

Infrastructure

Running Hooks

Handling Hook Failures

Code Standards

Python Style

Type Hints Example

Docstring Coverage

Security Guidelines

Commit Message Convention

Format

Types

Examples

Validation

Testing

Test Structure

Running Tests

Writing Tests

Test Requirements

Mutation Testing

What is Mutation Testing?

Running Mutation Tests

Interpreting Results

Fixing Surviving Mutants

Common Mutation Types

Best Practices

CI Integration

Configuration

Further Reading

Property-Based Testing

What is Property-Based Testing?

Why Use Property-Based Testing?

Running Property Tests

Writing Property Tests

Hypothesis Strategies

Common Properties to Test

Debugging Property Test Failures

Best Practices

Configuration

Further Reading

Complexity Monitoring

What is Cyclomatic Complexity?

Maintainability Index

Running Complexity Checks

Pre-commit Integration

Interpreting Results

Refactoring High Complexity Code

Complexity Targets

Best Practices

CI Integration

Further Reading

Pull Request Process

Before Submitting

Submitting PR

PR Review Process

After Merge

Questions?