This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Claude Code plugin for analyzing and optimizing GraalVM Truffle language performance. It provides a suite of specialized skills for profiling, tracing, and diagnosing performance issues in Truffle language implementations.
Skills are organized in the skills/ directory. Each skill is a self-contained directory with:
SKILL.md- Main skill definition with frontmatter metadata (name, description)- Additional
.mdfiles - Supporting documentation (WORKFLOW.md, PATTERNS.md, etc.)
Sub-agent skills use a thin dispatch pattern: SKILL.md contains only a short instruction to spawn a Task tool sub-agent, while WORKFLOW.md holds the full instructions that the sub-agent reads at runtime. This keeps the main conversation context slim.
The plugin manifest is located at .claude-plugin/plugin.json.
The plugin implements an optimization loop managed by the optimization-workflow-orchestrator skill:
Step 0: Baseline (once)
↓
┌─ Step 1: Explore ←───────────────┐
│ Step 2: Validate hypothesis │
│ Step 3: Plan │
│ Step 4: Implement │
│ Step 5: Validate benchmarks │
│ slower? ─────────────────────────┘
│ faster? ↓
│ Step 6: Refresh baseline
│ Step 7: Update lessons learned
│ Step 8: Clean up
│ Step 9: Commit → STOP
Step 0 — Baseline: If BENCHMARK_BASELINE.md doesn't exist, run the baseline-establisher sub-agent to create benchmarks and collect timing data.
Step 1 — Explore: Spawn an Explore sub-agent to read the baseline and lessons learned, explore the language implementation, and return 2-3 ranked hypotheses. The orchestrator writes hypotheses to HYPOTHESES.md.
Step 2 — Validate hypothesis: Validate the most critical unverified hypothesis. The orchestrator reads HYPOTHESES.md to determine the validation approach, then spawns the appropriate sub-agent (hypothesis-validator or compiler-graph-analyst) which reads HYPOTHESES.md itself. Both sub-agents update HYPOTHESES.md with results. Only if 100% rejected, try the next hypothesis; if partially confirmed, proceed to planning.
Step 3 — Plan: Spawn a Plan sub-agent to read the confirmed hypothesis and design an implementation plan. Writes plan to IMPLEMENTATION_PLAN.md. The orchestrator reviews the plan and may resume the Plan sub-agent with feedback (max three review rounds).
Step 4 — Implement: A general-purpose Task sub-agent implements the plan from IMPLEMENTATION_PLAN.md, builds, runs tests, and runs the two verification benchmarks for quick verification.
Step 5 — Validate benchmarks: Rerun every language benchmark (one at a time, not in parallel) using the commands and iteration counts from BENCHMARK_BASELINE.md. A change is "faster" if the geometric mean of speedup ratios across all benchmarks is >3% and no individual benchmark regresses by more than 5%. Faster → continue to Step 6. Slower → improve the implementation (max three retries) or restart from Step 1.
Step 6 — Refresh baseline: Replace the language results table in BENCHMARK_BASELINE.md with the new timing data. Keep the reference results (AWFY Python) unchanged.
Step 7 — Update lessons learned: Read HYPOTHESES.md for hypothesis results. Record what was tried and learned to LESSONS_LEARNED.md.
Step 8 — Clean up: Remove all intermediate artifacts (tool-outputs/, IMPLEMENTATION_PLAN.md) so the next iteration starts fresh. BENCHMARK_BASELINE.md and LESSONS_LEARNED.md carry over.
Step 9 — Commit: Commit the changes with a concise message describing what was optimized and the benchmark delta.
In skill references, <launcher> refers to the target language's launcher binary (e.g., ./sl, ./my-language).
Workflow orchestration:
optimization-workflow-orchestrator- Orchestrates the optimization loop above
Sub-agent skills (spawned as separate agent processes):
baseline-establisher- Runs all benchmarks and createsBENCHMARK_BASELINE.mdcompiler-graph-analyst- Analyzes compiler graphs for specific issues across benchmarks; receives hypothesis via prompt and updatesHYPOTHESES.mdwith findingshypothesis-validator- Validates a performance hypothesis by running profiling tools; receives hypothesis via prompt and updatesHYPOTHESES.mdwith confirmed/rejected status
Tool skills (invoke Truffle/Graal profiling tools):
profiling-with-cpu-sampler- Time-based profiling, tier breakdown (T0/T1/T2)profiling-memory-allocations- Allocation trackingtracing-execution-counts- Execution frequency measurementtracing-compilation-events- JIT compilation monitoringtracing-inlining-decisions- Inlining behavior analysisdetecting-deoptimizations- Deoptimization tracking (goal: zero in steady-state)detecting-performance-warnings- Finds optimization barriers (virtual calls, type checks)
Utility:
fetching-truffle-documentation- Access Truffle API docs
Every tool skill requires "Fermi verification" - pre-calculate expected results before running tools, then validate actual results are within expectations. This prevents silent tool failures and garbage data.
Tools save outputs to tool-outputs/ directory with naming pattern: tool-outputs/{tool-name}-{benchmark}.txt
BENCHMARK_BASELINE.md- Benchmark descriptions, timing data, performance expectationsLESSONS_LEARNED.md- Accumulated findings from past iterations: what was tried, what failed, and what approaches are ruled out
The skills are designed to detect and fix common Truffle performance anti-patterns (see PATTERNS.md in individual tool skill directories):
Control Flow Patterns:
- Megamorphic dispatch (>3 receiver types)
- Polymorphic cache instability
- Implementation-induced branches
- Guard proliferation
Truffle DSL Patterns:
- Missing primitive specializations (boxing overhead)
- Uncached library usage
- Hot path boundary calls
- Frame slot type instability
- Missing @Cached for lookups
- T0 (Interpreter) - Should be <10% for hot functions
- T1 (First-tier compiled) - Transitional
- T2 (Fully optimized) - Should be >80% for hot functions
To test changes to skills:
- Edit skill definitions in
skills/directory - Update
.claude-plugin/plugin.jsonif needed - Restart Claude Code:
claude --plugin-dir /path/to/cc-truffle-performance-plugin - Verify with
/helpcommand
Skills are model-invoked (Claude selects based on context). Sub-agent skills (baseline-establisher, compiler-graph-analyst, hypothesis-validator) are spawned as separate agent processes with their own context. Sub-agents communicate results via files. The hypothesis-validator agent reads tool skill SKILL.md files at runtime to understand how to run profiling commands. Tool skills run in the main conversation and provide Truffle/Graal profiling domain knowledge.