fetchkit/AGENTS.md at main · everruns/fetchkit

Coding-agent guidance (repo root)

This repo is intended to be runnable locally and easy for coding agents to work in.

Always make sure you are working on top of latest main from remote. Especially in detached worktrees, fetch origin/main and rebase or branch from it before editing.

Style Telegraph. Drop filler/grammar. Min tokens (global AGENTS + replies).

Critical Thinking Fix root cause (not band-aid). Unsure: read more code; if still stuck, ask w/ short options. Unrecognized changes: assume other agent; keep going; focus your changes. If it causes issues, stop + ask user. Leave breadcrumb notes in thread.

Attribution NEVER add links to Claude sessions in PR body or commits. Also never attribute commit or merge commit to coding agents, always use real user.

Before committing, configure git user from environment variables:

git config user.name "$GIT_USER_NAME"
git config user.email "$GIT_USER_EMAIL"

GIT_USER_NAME and GIT_USER_EMAIL must be set in the session.

Principles

Keep decisions as comments on top of the file. Only important decisions that could not be inferred from code.
Code should be easily testable, smoke testable, runnable in local dev env.
Prefer small, incremental PR-sized changes with a runnable state at each step.
Avoid adding dependencies with non-permissive licenses. If a dependency is non-permissive or unclear, stop and ask the repo owner.

Top level requirements

AI-friendly web content fetching tool designed for LLM consumption. Rust library with CLI, MCP server, and Python bindings.

Key capabilities:

HTTP fetching (GET/HEAD) with streaming support
HTML-to-Markdown and HTML-to-Text conversion optimized for LLMs
Binary content detection (returns metadata only)
Timeout handling with partial content on timeout
URL filtering via allow/block lists
MCP server for AI tool integration

Specs

specs/ folder contains feature specifications outlining requirements for specific features and components. New code should comply with these specifications or propose changes to them.

Available specs:

specs/initial.md - WebFetch tool specification (types, behavior, conversions, error handling)
specs/fetchers.md - Pluggable fetcher system for URL-specific handling
specs/release-process.md - Agent-driven release and publish workflow
specs/maintenance.md - Periodic maintenance checklist (deps, docs, spec-code alignment)
specs/threat-model.md - Security threat model (SSRF, network, input validation, DoS)
specs/bot-auth.md - Web Bot Authentication (draft-meunier-web-bot-auth-architecture)

Specification format: Abstract and Requirements sections.

Shipping

Implement → test → /ship. The /ship command (.claude/commands/ship.md) runs a 10-phase workflow: pre-flight, test coverage, code simplification, security review, artifact updates, smoke testing, quality gates, push+PR, CI wait+merge, post-merge report.

Phases 2–6 (tests, simplification, security, artifacts, smoke) are the quality core — never skip.

When asked to "fix and ship": implement fix first, then run /ship.

Skills

.claude/skills/ contains development skills following the Agent Skills Specification.

Available skills:

/ship — 10-phase shipping workflow (.claude/commands/ship.md)
/processing-issues — Batch-process GitHub issues: triage, implement, ship via individual PRs (.claude/commands/processing-issues.md)
/process-issues — Resolve all open GitHub issues e2e; one issue = one shipped PR (.claude/skills/process-issues/SKILL.md)

Agent-portable paths

.agents/ mirrors .claude/ via symlinks for agent-agnostic access:

.agents/commands/ → .claude/commands/
.agents/skills/ → .claude/skills/

Public Documentation

docs/ contains public-facing user documentation. This documentation is intended for end users and operators of the system, not for internal development reference.

When making changes that affect user-facing behavior or operations, update the relevant docs in this folder.

Local dev expectations

Requirements:

Rust stable toolchain (rustup recommended)
cargo for building and testing

Quick start:

cargo build --workspace --exclude fetchkit-python  # Build default Rust artifacts
cargo test --workspace           # Run all tests
cargo run -p fetchkit-cli -- --help  # Run CLI

Note: fetchkit-python currently requires a separate Python link environment and is not part of the default release-build smoke path.

Code organization

crates/
├── fetchkit/           # Core library - types, fetch logic, HTML conversion
├── fetchkit-cli/       # CLI binary and MCP server
└── fetchkit-python/    # Python bindings (PyO3)
specs/                  # Feature specifications

Naming

Crate names: fetchkit, fetchkit-cli, fetchkit-python
Types: PascalCase (WebFetchRequest, WebFetchResponse)
Functions: snake_case (fetch, html_to_markdown)
Constants: SCREAMING_SNAKE_CASE

CI expectations

CI is implemented using GitHub Actions.
Jobs: lint, test, build, doc, examples, check
check is the branch-protection gate and must stay green
All jobs must pass before merging
Clippy runs with -D warnings (warnings are errors)
Doc builds must not have warnings

Releasing

See specs/release-process.md for the release contract.

Quick summary:

Human asks agent: "Create release v0.2.0"
Agent updates CHANGELOG.md (with Highlights + What's Changed), Cargo.toml version, creates PR
Human reviews and merges PR to main
CI creates GitHub Release via softprops/action-gh-release (release.yml)
GitHub Release publication triggers publish.yml
CI publishes fetchkit then fetchkit-cli to crates.io

Workflows:

.github/workflows/release.yml - Creates GitHub Release on merge or manual dispatch
.github/workflows/publish.yml - Publishes to crates.io on GitHub Release or manual dispatch

Requirements:

CARGO_REGISTRY_TOKEN secret must be configured in repo settings

Note: fetchkit-python is not published to crates.io (publish = false). Python release automation is not configured in this repo yet.

Cloud Agent (start here)

Use Doppler for all secret-backed commands in cloud agents.

./scripts/init-cloud-env.sh

Disable incremental compilation in cloud (saves ~3 GB, useless for single builds):

export CARGO_INCREMENTAL=0

All cloud secrets are in Doppler (GITHUB_TOKEN). Project: everruns-dev, config: dev.

For GitHub CLI, map token explicitly:

doppler run -- bash -lc 'GH_TOKEN="$GITHUB_TOKEN" gh auth status'

Quickcheck:

doppler run -- env | rg 'GITHUB_TOKEN'
doppler run -- bash -lc 'GH_TOKEN="$GITHUB_TOKEN" gh auth status'

Pre-PR checklist

Before creating a pull request, ensure:

Branch rebased: Rebase on latest main to avoid merge conflicts. In detached worktrees, first create or switch to a topic branch that is based on origin/main.
```
git fetch origin main && git rebase origin/main
```
Formatting: Run formatter and fix any issues
```
cargo fmt --all
```

Linting: Run clippy and fix all warnings

cargo clippy --workspace --all-targets -- -D warnings

Tests: Run all tests and ensure they pass
```
cargo test --workspace
```

Documentation: Ensure docs build without warnings

RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps

Release build smoke: Ensure the publishable Rust crates build in release mode
```
cargo build --workspace --exclude fetchkit-python --release
```
CI green: All CI checks must pass before merging
PR comments resolved: No unaddressed review comments in PR
Specs: If changes affect system behavior, update specs in specs/
Docs: If changes affect usage or configuration, update public docs in docs/

CI will fail if formatting, linting, tests, release build smoke, or doc build fail. Always run these locally before pushing.

Commit message conventions

Follow Conventional Commits for all commit messages:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style (formatting, semicolons, etc.)
refactor: Code refactoring without feature/fix
perf: Performance improvements
test: Adding or updating tests
chore: Build process, dependencies, tooling
ci: CI configuration changes

Examples:

feat(api): add agent versioning endpoint
fix(workflow): handle timeout in run execution
docs: update API documentation
refactor(db): simplify connection pooling

Validation (optional):

# Validate a commit message
echo "feat: add new feature" | npx commitlint

# Validate last commit
npx commitlint --from HEAD~1 --to HEAD

PR (Pull Request) conventions

PR titles should follow Conventional Commits format. Use the PR template (.github/pull_request_template.md) for descriptions.

PR Body Template:

## What
Clear description of the change.

## Why
Problem or motivation.

## How
High-level approach.

## Risk
- Low / Medium / High
- What can break

### Checklist
- [ ] Unit tests are passed
- [ ] Smoke tests are passed
- [ ] Documenation is updated
- [ ] Specs are up to date and not in conflict
- [ ] ... other check list items

Testing the system

# Run all tests
cargo test --workspace

# Run tests with output
cargo test --workspace -- --nocapture

# Run specific test
cargo test --workspace test_name

# Test CLI directly
cargo run -p webfetch-cli -- --url https://example.com --as-markdown

# Test MCP server
cargo run -p webfetch-cli -- mcp

Tests use wiremock for HTTP mocking (no real external network calls). See specs/initial.md for test requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coding-agent guidance (repo root)

Principles

Top level requirements

Specs

Shipping

Skills

Agent-portable paths

Public Documentation

Local dev expectations

Code organization

Naming

CI expectations

Releasing

Cloud Agent (start here)

Pre-PR checklist

Commit message conventions

PR (Pull Request) conventions

Testing the system

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Coding-agent guidance (repo root)

Principles

Top level requirements

Specs

Shipping

Skills

Agent-portable paths

Public Documentation

Local dev expectations

Code organization

Naming

CI expectations

Releasing

Cloud Agent (start here)

Pre-PR checklist

Commit message conventions

PR (Pull Request) conventions

Testing the system