Annotation-Garden
diff --git a/‎.context/image-differential-research.md‎
Lines changed: 730 additions & 0 deletions b/‎.context/image-differential-research.md‎
Lines changed: 730 additions & 0 deletions
diff --git a/‎.env.example‎
Lines changed: 28 additions & 4 deletions b/‎.env.example‎
Lines changed: 28 additions & 4 deletions
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 65 additions & 7 deletions b/‎.github/workflows/test.yml‎
Lines changed: 65 additions & 7 deletions
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎.serena/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.serena/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.serena/cache/python/document_symbols.pkl‎
-129 KB b/‎.serena/cache/python/document_symbols.pkl‎
-129 KB
diff --git a/‎.serena/cache/python/raw_document_symbols.pkl‎
-21.8 KB b/‎.serena/cache/python/raw_document_symbols.pkl‎
-21.8 KB
diff --git a/‎.serena/memories/code_style_and_conventions.md‎
Lines changed: 50 additions & 0 deletions b/‎.serena/memories/code_style_and_conventions.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎.serena/memories/codebase_structure.md‎
Lines changed: 65 additions & 0 deletions b/‎.serena/memories/codebase_structure.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎.serena/memories/project_overview.md‎
Lines changed: 32 additions & 0 deletions b/‎.serena/memories/project_overview.md‎
Lines changed: 32 additions & 0 deletions
@@ -67,10 +67,11 @@ OPENROUTER_API_KEY=your-openrouter-api-key-here
 ANNOTATION_MODEL=mistralai/mistral-small-3.2-24b-instruct
 ANNOTATION_PROVIDER=mistral
 
-# Evaluation/Assessment Model (consistent quality checks: Qwen3-235B via Cerebras)
+# Evaluation/Assessment Model (consistent quality checks: Qwen3-235B via DeepInfra)
 # Used for evaluation, assessment, and feedback agents
+# Leave EVALUATION_PROVIDER empty to let OpenRouter auto-route
 EVALUATION_MODEL=qwen/qwen3-235b-a22b-2507
-EVALUATION_PROVIDER=Cerebras
+EVALUATION_PROVIDER=deepinfra/fp8
 
 # Vision Model (image description: Qwen3-VL via deepinfra)
 VISION_MODEL=qwen/qwen3-vl-30b-a3b-instruct
@@ -97,6 +98,30 @@ VISION_PROVIDER=deepinfra/fp8
 # ============================================================================
 # HED Configuration
 # ============================================================================
+HED_SCHEMA_VERSION=8.4.0
+
+# ============================================================================
+# HED-LSP Configuration (Recommended)
+# ============================================================================
+# HEDit can use hed-lsp CLI for HED tag suggestions.
+# Install: git clone https://github.com/hed-standard/hed-lsp.git
+#          cd hed-lsp/server && npm install && npm run compile && npm link
+#
+# Once installed, the 'hed-suggest' command will be available in PATH.
+# HEDit auto-detects hed-lsp availability - no configuration needed!
+
+# Enable semantic search for better tag suggestions (requires embeddings)
+# HED_LSP_USE_SEMANTIC=false
+
+# Maximum number of tag suggestions to return
+# HED_LSP_MAX_RESULTS=10
+
+# ============================================================================
+# Legacy JavaScript Validator (Deprecated)
+# ============================================================================
+# NOTE: The JavaScript validator is deprecated in favor of hed-lsp.
+# Only use if you need the legacy hed-javascript integration.
+#
 # NOTE: In Docker, paths are auto-detected. Do NOT set these unless you
 # need to override (e.g., for local development outside Docker).
 # Setting empty values will BREAK Docker deployment!
@@ -105,8 +130,7 @@ VISION_PROVIDER=deepinfra/fp8
 # HED_SCHEMA_DIR=/path/to/hed-schemas/schemas_latest_json
 # HED_VALIDATOR_PATH=/path/to/hed-javascript
 
-HED_SCHEMA_VERSION=8.4.0
-USE_JS_VALIDATOR=true
+USE_JS_VALIDATOR=false
 
 # ============================================================================
 # API Configuration
 
@@ -95,13 +95,14 @@ jobs:
           echo "Changed files:"
           echo "$CHANGED"
 
-          # Integration tests should run if these paths change:
-          # - Integration test files (test_integration*.py, test_cli_integration.py)
-          # - Agent code (workflow, annotation, evaluation, etc.)
+          # Integration/standalone tests should run if these paths change:
+          # - Integration test files (test_integration*.py, test_standalone*.py)
+          # - Agent code (workflow, annotation, evaluation, keyword extraction, etc.)
           # - Validation code
-          # - OpenRouter LLM utility
+          # - OpenRouter/LiteLLM utility
           # - CLI code (for CLI integration tests)
-          INTEGRATION_PATTERNS="tests/test_.*integration|src/agents/|src/validation/|src/utils/openrouter|src/cli/"
+          # - Semantic search code
+          INTEGRATION_PATTERNS="tests/test_.*integration|tests/test_standalone|src/agents/|src/validation/|src/utils/openrouter|src/utils/litellm|src/utils/semantic|src/cli/"
 
           if echo "$CHANGED" | grep -qE "$INTEGRATION_PATTERNS"; then
             echo "integration_needed=true" >> $GITHUB_OUTPUT
@@ -111,6 +112,55 @@ jobs:
             echo "No integration-related files changed - tests will be skipped"
           fi
 
+  # Standalone tests verify LLM behavior hasn't regressed across PRs.
+  # Only runs on PRs targeting main when LangGraph components are touched,
+  # to avoid unnecessary API costs on feature branch PRs.
+  standalone-tests:
+    name: Standalone Tests (PR to main)
+    runs-on: ubuntu-latest
+    needs: [check-changes]
+    if: |
+      github.event_name == 'pull_request' &&
+      github.event.pull_request.base.ref == 'main' &&
+      needs.check-changes.outputs.integration_needed == 'true'
+
+    steps:
+      - uses: actions/checkout@v6
+
+      - name: Set up Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: '3.12'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+
+      - name: Install dependencies
+        run: |
+          uv pip install --system -e ".[dev]"
+          uv pip install --system pytest-timeout
+
+      - name: Run standalone tests only
+        env:
+          OPENROUTER_API_KEY_FOR_TESTING: ${{ secrets.OPENROUTER_API_KEY_FOR_TESTING }}
+        run: |
+          if [ -n "$OPENROUTER_API_KEY_FOR_TESTING" ]; then
+            echo "Running standalone tests..."
+            pytest tests/ -v -m standalone --timeout=180 --cov=src --cov-report=xml:coverage-standalone.xml --cov-report=term-missing
+          else
+            echo "OPENROUTER_API_KEY_FOR_TESTING not set, skipping standalone tests"
+          fi
+
+      - name: Upload standalone coverage to Codecov
+        if: always()
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          files: ./coverage-standalone.xml
+          flags: standalone
+          name: codecov-standalone
+          fail_ci_if_error: false
+
   integration-tests:
     name: Integration Tests (Real LLM)
     runs-on: ubuntu-latest
@@ -163,8 +213,8 @@ jobs:
   all-tests:
     name: All Tests Summary
     runs-on: ubuntu-latest
-    needs: [lint, unit-tests, check-changes, integration-tests]
-    # Run even if integration-tests was skipped (on PRs or no src changes)
+    needs: [lint, unit-tests, check-changes, standalone-tests, integration-tests]
+    # Run even if integration-tests or standalone-tests was skipped
     if: always()
 
     steps:
@@ -180,6 +230,13 @@ jobs:
           echo "Unit tests failed"
           exit 1
 
+      - name: Check standalone tests result
+        # Only fail if standalone tests ran and failed (not if skipped)
+        if: needs.standalone-tests.result == 'failure'
+        run: |
+          echo "Standalone tests failed"
+          exit 1
+
       - name: Check integration tests result
         # Only fail if integration tests ran and failed (not if skipped)
         if: needs.integration-tests.result == 'failure'
@@ -192,5 +249,6 @@ jobs:
           echo "All required checks passed!"
           echo "Lint: ${{ needs.lint.result }}"
           echo "Unit tests: ${{ needs.unit-tests.result }}"
+          echo "Standalone tests: ${{ needs.standalone-tests.result }}"
           echo "Integration tests: ${{ needs.integration-tests.result }}"
           echo "Integration needed: ${{ needs.check-changes.outputs.integration_needed }}"
@@ -85,3 +85,7 @@ coverage-*.xml
 
 # UV package manager
 uv.lock
+
+# Local development tools (tool-specific artifacts, not project files)
+.hedit/
+.serena/cache/
@@ -0,0 +1 @@
+/cache
@@ -0,0 +1,50 @@
+# Code Style and Conventions
+
+## Python Style
+- **Formatter**: ruff format (Black-compatible, 88 char line)
+- **Linter**: ruff check with --fix --unsafe-fixes
+- **Type checker**: ty
+- **Imports**: Sorted via isort (ruff)
+- **Python**: 3.12+ features (type unions with `|`, etc.)
+
+## Naming Conventions
+- **Classes**: PascalCase (e.g., `HedAnnotationWorkflow`, `ValidationResult`)
+- **Functions/methods**: snake_case (e.g., `get_complete_system_prompt`)
+- **Constants**: UPPER_SNAKE_CASE (e.g., `HED_SYNTAX_RULES`)
+- **Files**: snake_case (e.g., `hed_validator.py`)
+- **Private methods**: Leading underscore (e.g., `_build_graph`, `_route_after_validation`)
+
+## Type Hints
+- Required for all public functions and methods
+- Use modern syntax: `list[str]`, `dict[str, Any]`, `str | None`
+- TypedDict for state objects
+- Literal types for constrained strings
+
+## Docstrings
+- Google style
+- Required for classes and public methods
+- Include Args, Returns, Raises sections
+
+## Patterns
+- **TypedDict** for state management (LangGraph)
+- **Dataclasses** for data structures (`ValidationIssue`, `ValidationResult`)
+- **Async/await** for API endpoints and LangGraph workflows
+- **Context managers** for resource management
+- **Pathlib** for file operations
+- **F-strings** for formatting
+- **Logging** via `logging.getLogger(__name__)`
+
+## Commit Messages
+- Format: `<type>: <description>` (<50 chars)
+- Types: feat, fix, docs, refactor, test, chore
+- No emojis, no co-author mentions
+- Atomic commits (one logical change each)
+
+## Testing
+- NO MOCKS policy (real data, real API calls)
+- Use `OPENROUTER_API_KEY_FOR_TESTING` for integration tests
+- Mark integration tests with `@pytest.mark.integration`
+- Coverage tracking via codecov
+
+## Pre-commit Hooks
+- Ruff check with --fix --unsafe-fixes on staged files only
@@ -0,0 +1,65 @@
+# Codebase Structure
+
+```
+hedit/
+├── src/
+│   ├── agents/                    # LangGraph multi-agent system
+│   │   ├── workflow.py            # HedAnnotationWorkflow (orchestration)
+│   │   ├── state.py               # HedAnnotationState (TypedDict)
+│   │   ├── annotation_agent.py    # Generates HED tags from text
+│   │   ├── validation_agent.py    # Validates HED compliance
+│   │   ├── evaluation_agent.py    # Assesses faithfulness
+│   │   ├── assessment_agent.py    # Final completeness check
+│   │   ├── feedback_summarizer.py # Condenses feedback for LLM
+│   │   ├── vision_agent.py        # Image-to-text via vision LLMs
+│   │   └── feedback_triage_agent.py # User feedback processing
+│   ├── validation/
+│   │   ├── hed_validator.py       # Dual JS+Python validators
+│   │   └── hed_lsp.py             # HED Language Server Protocol
+│   ├── utils/
+│   │   ├── hed_rules.py           # HED syntax/semantic rules (system prompts)
+│   │   ├── hed_comprehensive_guide.py # Full LLM-optimized HED guide
+│   │   ├── json_schema_loader.py  # JSON schema + vocabulary extraction
+│   │   ├── schema_loader.py       # Legacy Python schema loader
+│   │   ├── error_remediation.py   # Error augmentation for LLM feedback
+│   │   ├── openrouter_llm.py      # OpenRouter API integration
+│   │   ├── litellm_llm.py         # Alternative LLM providers
+│   │   ├── image_processing.py    # Base64 image encoding
+│   │   └── github_client.py       # GitHub API for feedback issues
+│   ├── api/
+│   │   ├── main.py                # FastAPI app (endpoints)
+│   │   ├── models.py              # Pydantic request/response models
+│   │   └── security.py            # Authentication
+│   ├── cli/
+│   │   ├── main.py                # Typer CLI entry point
+│   │   ├── config.py              # Config management (YAML)
+│   │   ├── client.py              # API client
+│   │   ├── executor.py            # Execution strategy
+│   │   ├── local_executor.py      # Local workflow execution
+│   │   ├── api_executor.py        # Remote API execution
+│   │   ├── output.py              # Output formatting
+│   │   └── commands/              # CLI subcommands
+│   ├── telemetry/
+│   │   ├── collector.py           # Data collection
+│   │   ├── storage.py             # Local + Cloudflare KV
+│   │   └── schema.py              # Telemetry data schema
+│   ├── scripts/
+│   │   └── process_feedback.py    # Feedback processing
+│   └── version.py                 # Version info
+├── tests/                         # pytest test suite (22 files)
+├── docs/                          # Documentation
+├── frontend/                      # Web UI (Cloudflare Pages)
+├── workers/                       # Cloudflare Workers proxy
+├── deploy/                        # Deployment configs
+├── docker/                        # Docker configs
+├── scripts/                       # Utility scripts (bump_version.py)
+├── .context/                      # Context files for AI agents
+├── .rules/                        # Development rules
+├── pyproject.toml                 # Project config + dependencies
+└── CLAUDE.md                      # AI agent instructions
+```
+
+## Key Entry Points
+- **API**: `src/api/main.py` (FastAPI app)
+- **CLI**: `src/cli/main.py` (Typer CLI, registered as `hedit` command)
+- **Workflow**: `src/agents/workflow.py` (LangGraph state graph)
@@ -0,0 +1,32 @@
+# HEDit Project Overview
+
+## Purpose
+HEDit is a multi-agent system for converting natural language event descriptions into valid HED (Hierarchical Event Descriptors) annotation strings. Part of the Annotation Garden Initiative (AGI).
+
+## Tech Stack
+- **Language**: Python 3.12+
+- **Package Manager**: uv
+- **Agent Framework**: LangGraph
+- **LLM Provider**: OpenRouter API (production), Ollama (fallback)
+- **Validation**: HED JavaScript validator + HED Python tools (hedtools)
+- **Backend**: FastAPI
+- **CLI**: Typer + Rich
+- **Frontend**: Cloudflare Pages
+- **Workers**: Cloudflare Workers (API proxy)
+- **API Hosting**: api.annotation.garden (SCCN VM via Apache reverse proxy)
+
+## Current Version
+- `0.7.0.dev1` on develop branch
+- `0.6.8a2` on main branch
+
+## Key Architecture
+- Multi-agent workflow: annotate -> validate -> evaluate -> assess
+- Iterative refinement with validation feedback loops
+- Dual validation: JavaScript (preferred) + Python (fallback)
+- State machine: `HedAnnotationState(TypedDict)` in `src/agents/state.py`
+- Workflow orchestration: LangGraph StateGraph in `src/agents/workflow.py`
+
+## Branching
+- **main**: Production releases (alpha/beta/stable)
+- **develop**: Active development (dev releases)
+- Feature branches from develop, PRs target develop