CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a Claude Code plugin for analyzing and optimizing GraalVM Truffle language performance. It provides a suite of specialized skills for profiling, tracing, and diagnosing performance issues in Truffle language implementations.

Plugin Structure

Skills are organized in the skills/ directory. Each skill is a self-contained directory with:

SKILL.md - Main skill definition with frontmatter metadata (name, description)
Additional .md files - Supporting documentation (WORKFLOW.md, PATTERNS.md, etc.)

Sub-agent skills use a thin dispatch pattern: SKILL.md contains only a short instruction to spawn a Task tool sub-agent, while WORKFLOW.md holds the full instructions that the sub-agent reads at runtime. This keeps the main conversation context slim.

The plugin manifest is located at .claude-plugin/plugin.json.

Performance Optimization Workflow

The plugin implements an optimization loop managed by the optimization-workflow-orchestrator skill:

Step 0: Baseline  (once)
    ↓
┌─ Step 1: Explore ←───────────────┐
│  Step 2: Validate hypothesis      │
│  Step 3: Plan                     │
│  Step 4: Implement                │
│  Step 5: Validate benchmarks      │
│  slower? ─────────────────────────┘
│  faster? ↓
│  Step 6: Refresh baseline
│  Step 7: Update lessons learned
│  Step 8: Clean up
│  Step 9: Commit → STOP

Step 0 — Baseline: If BENCHMARK_BASELINE.md doesn't exist, run the baseline-establisher sub-agent to create benchmarks and collect timing data.

Step 1 — Explore: Spawn an Explore sub-agent to read the baseline and lessons learned, explore the language implementation, and return 2-3 ranked hypotheses. The orchestrator writes hypotheses to HYPOTHESES.md.

Step 2 — Validate hypothesis: Validate the most critical unverified hypothesis. The orchestrator reads HYPOTHESES.md to determine the validation approach, then spawns the appropriate sub-agent (hypothesis-validator or compiler-graph-analyst) which reads HYPOTHESES.md itself. Both sub-agents update HYPOTHESES.md with results. Only if 100% rejected, try the next hypothesis; if partially confirmed, proceed to planning.

Step 3 — Plan: Spawn a Plan sub-agent to read the confirmed hypothesis and design an implementation plan. Writes plan to IMPLEMENTATION_PLAN.md. The orchestrator reviews the plan and may resume the Plan sub-agent with feedback (max three review rounds).

Step 4 — Implement: A general-purpose Task sub-agent implements the plan from IMPLEMENTATION_PLAN.md, builds, runs tests, and runs the two verification benchmarks for quick verification.

Step 5 — Validate benchmarks: Rerun every language benchmark (one at a time, not in parallel) using the commands and iteration counts from BENCHMARK_BASELINE.md. A change is "faster" if the geometric mean of speedup ratios across all benchmarks is >3% and no individual benchmark regresses by more than 5%. Faster → continue to Step 6. Slower → improve the implementation (max three retries) or restart from Step 1.

Step 6 — Refresh baseline: Replace the language results table in BENCHMARK_BASELINE.md with the new timing data. Keep the reference results (AWFY Python) unchanged.

Step 7 — Update lessons learned: Read HYPOTHESES.md for hypothesis results. Record what was tried and learned to LESSONS_LEARNED.md.

Step 8 — Clean up: Remove all intermediate artifacts (tool-outputs/, IMPLEMENTATION_PLAN.md) so the next iteration starts fresh. BENCHMARK_BASELINE.md and LESSONS_LEARNED.md carry over.

Step 9 — Commit: Commit the changes with a concise message describing what was optimized and the benchmark delta.

Skills

In skill references, <launcher> refers to the target language's launcher binary (e.g., ./sl, ./my-language).

Workflow orchestration:

optimization-workflow-orchestrator - Orchestrates the optimization loop above

Sub-agent skills (spawned as separate agent processes):

baseline-establisher - Runs all benchmarks and creates BENCHMARK_BASELINE.md
compiler-graph-analyst - Analyzes compiler graphs for specific issues across benchmarks; receives hypothesis via prompt and updates HYPOTHESES.md with findings
hypothesis-validator - Validates a performance hypothesis by running profiling tools; receives hypothesis via prompt and updates HYPOTHESES.md with confirmed/rejected status

Tool skills (invoke Truffle/Graal profiling tools):

profiling-with-cpu-sampler - Time-based profiling, tier breakdown (T0/T1/T2)
profiling-memory-allocations - Allocation tracking
tracing-execution-counts - Execution frequency measurement
tracing-compilation-events - JIT compilation monitoring
tracing-inlining-decisions - Inlining behavior analysis
detecting-deoptimizations - Deoptimization tracking (goal: zero in steady-state)
detecting-performance-warnings - Finds optimization barriers (virtual calls, type checks)

Utility:

fetching-truffle-documentation - Access Truffle API docs

Key Concepts

Fermi Verification

Every tool skill requires "Fermi verification" - pre-calculate expected results before running tools, then validate actual results are within expectations. This prevents silent tool failures and garbage data.

Tool Output Storage

Tools save outputs to tool-outputs/ directory with naming pattern: tool-outputs/{tool-name}-{benchmark}.txt

State Files

BENCHMARK_BASELINE.md - Benchmark descriptions, timing data, performance expectations
LESSONS_LEARNED.md - Accumulated findings from past iterations: what was tried, what failed, and what approaches are ruled out

Common Patterns

Truffle Performance Issues

The skills are designed to detect and fix common Truffle performance anti-patterns (see PATTERNS.md in individual tool skill directories):

Control Flow Patterns:

Megamorphic dispatch (>3 receiver types)
Polymorphic cache instability
Implementation-induced branches
Guard proliferation

Truffle DSL Patterns:

Missing primitive specializations (boxing overhead)
Uncached library usage
Hot path boundary calls
Frame slot type instability
Missing @Cached for lookups

Compilation Tiers

T0 (Interpreter) - Should be <10% for hot functions
T1 (First-tier compiled) - Transitional
T2 (Fully optimized) - Should be >80% for hot functions

Testing Plugin Changes

To test changes to skills:

Edit skill definitions in skills/ directory
Update .claude-plugin/plugin.json if needed
Restart Claude Code: claude --plugin-dir /path/to/cc-truffle-performance-plugin
Verify with /help command

Skill Invocation

Skills are model-invoked (Claude selects based on context). Sub-agent skills (baseline-establisher, compiler-graph-analyst, hypothesis-validator) are spawned as separate agent processes with their own context. Sub-agents communicate results via files. The hypothesis-validator agent reads tool skill SKILL.md files at runtime to understand how to run profiling commands. Tool skills run in the main conversation and provide Truffle/Graal profiling domain knowledge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Overview

Plugin Structure

Performance Optimization Workflow

Skills

Key Concepts

Fermi Verification

Tool Output Storage

State Files

Common Patterns

Truffle Performance Issues

Compilation Tiers

Testing Plugin Changes

Skill Invocation

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Overview

Plugin Structure

Performance Optimization Workflow

Skills

Key Concepts

Fermi Verification

Tool Output Storage

State Files

Common Patterns

Truffle Performance Issues

Compilation Tiers

Testing Plugin Changes

Skill Invocation