Skip to content

Commit 1d22882

Browse files
Merge pull request #110 from Annotation-Garden/develop
fix: SSE streaming parser robustness and results display bug
2 parents ea7cac5 + 72e50de commit 1d22882

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+4253
-339
lines changed

.context/image-differential-research.md

Lines changed: 730 additions & 0 deletions
Large diffs are not rendered by default.

.env.example

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,11 @@ OPENROUTER_API_KEY=your-openrouter-api-key-here
6767
ANNOTATION_MODEL=mistralai/mistral-small-3.2-24b-instruct
6868
ANNOTATION_PROVIDER=mistral
6969

70-
# Evaluation/Assessment Model (consistent quality checks: Qwen3-235B via Cerebras)
70+
# Evaluation/Assessment Model (consistent quality checks: Qwen3-235B via DeepInfra)
7171
# Used for evaluation, assessment, and feedback agents
72+
# Leave EVALUATION_PROVIDER empty to let OpenRouter auto-route
7273
EVALUATION_MODEL=qwen/qwen3-235b-a22b-2507
73-
EVALUATION_PROVIDER=Cerebras
74+
EVALUATION_PROVIDER=deepinfra/fp8
7475

7576
# Vision Model (image description: Qwen3-VL via deepinfra)
7677
VISION_MODEL=qwen/qwen3-vl-30b-a3b-instruct
@@ -97,6 +98,30 @@ VISION_PROVIDER=deepinfra/fp8
9798
# ============================================================================
9899
# HED Configuration
99100
# ============================================================================
101+
HED_SCHEMA_VERSION=8.4.0
102+
103+
# ============================================================================
104+
# HED-LSP Configuration (Recommended)
105+
# ============================================================================
106+
# HEDit can use hed-lsp CLI for HED tag suggestions.
107+
# Install: git clone https://github.com/hed-standard/hed-lsp.git
108+
# cd hed-lsp/server && npm install && npm run compile && npm link
109+
#
110+
# Once installed, the 'hed-suggest' command will be available in PATH.
111+
# HEDit auto-detects hed-lsp availability - no configuration needed!
112+
113+
# Enable semantic search for better tag suggestions (requires embeddings)
114+
# HED_LSP_USE_SEMANTIC=false
115+
116+
# Maximum number of tag suggestions to return
117+
# HED_LSP_MAX_RESULTS=10
118+
119+
# ============================================================================
120+
# Legacy JavaScript Validator (Deprecated)
121+
# ============================================================================
122+
# NOTE: The JavaScript validator is deprecated in favor of hed-lsp.
123+
# Only use if you need the legacy hed-javascript integration.
124+
#
100125
# NOTE: In Docker, paths are auto-detected. Do NOT set these unless you
101126
# need to override (e.g., for local development outside Docker).
102127
# Setting empty values will BREAK Docker deployment!
@@ -105,8 +130,7 @@ VISION_PROVIDER=deepinfra/fp8
105130
# HED_SCHEMA_DIR=/path/to/hed-schemas/schemas_latest_json
106131
# HED_VALIDATOR_PATH=/path/to/hed-javascript
107132

108-
HED_SCHEMA_VERSION=8.4.0
109-
USE_JS_VALIDATOR=true
133+
USE_JS_VALIDATOR=false
110134

111135
# ============================================================================
112136
# API Configuration

.github/workflows/test.yml

Lines changed: 65 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,13 +95,14 @@ jobs:
9595
echo "Changed files:"
9696
echo "$CHANGED"
9797
98-
# Integration tests should run if these paths change:
99-
# - Integration test files (test_integration*.py, test_cli_integration.py)
100-
# - Agent code (workflow, annotation, evaluation, etc.)
98+
# Integration/standalone tests should run if these paths change:
99+
# - Integration test files (test_integration*.py, test_standalone*.py)
100+
# - Agent code (workflow, annotation, evaluation, keyword extraction, etc.)
101101
# - Validation code
102-
# - OpenRouter LLM utility
102+
# - OpenRouter/LiteLLM utility
103103
# - CLI code (for CLI integration tests)
104-
INTEGRATION_PATTERNS="tests/test_.*integration|src/agents/|src/validation/|src/utils/openrouter|src/cli/"
104+
# - Semantic search code
105+
INTEGRATION_PATTERNS="tests/test_.*integration|tests/test_standalone|src/agents/|src/validation/|src/utils/openrouter|src/utils/litellm|src/utils/semantic|src/cli/"
105106
106107
if echo "$CHANGED" | grep -qE "$INTEGRATION_PATTERNS"; then
107108
echo "integration_needed=true" >> $GITHUB_OUTPUT
@@ -111,6 +112,55 @@ jobs:
111112
echo "No integration-related files changed - tests will be skipped"
112113
fi
113114
115+
# Standalone tests verify LLM behavior hasn't regressed across PRs.
116+
# Only runs on PRs targeting main when LangGraph components are touched,
117+
# to avoid unnecessary API costs on feature branch PRs.
118+
standalone-tests:
119+
name: Standalone Tests (PR to main)
120+
runs-on: ubuntu-latest
121+
needs: [check-changes]
122+
if: |
123+
github.event_name == 'pull_request' &&
124+
github.event.pull_request.base.ref == 'main' &&
125+
needs.check-changes.outputs.integration_needed == 'true'
126+
127+
steps:
128+
- uses: actions/checkout@v6
129+
130+
- name: Set up Python
131+
uses: actions/setup-python@v6
132+
with:
133+
python-version: '3.12'
134+
135+
- name: Install uv
136+
uses: astral-sh/setup-uv@v7
137+
138+
- name: Install dependencies
139+
run: |
140+
uv pip install --system -e ".[dev]"
141+
uv pip install --system pytest-timeout
142+
143+
- name: Run standalone tests only
144+
env:
145+
OPENROUTER_API_KEY_FOR_TESTING: ${{ secrets.OPENROUTER_API_KEY_FOR_TESTING }}
146+
run: |
147+
if [ -n "$OPENROUTER_API_KEY_FOR_TESTING" ]; then
148+
echo "Running standalone tests..."
149+
pytest tests/ -v -m standalone --timeout=180 --cov=src --cov-report=xml:coverage-standalone.xml --cov-report=term-missing
150+
else
151+
echo "OPENROUTER_API_KEY_FOR_TESTING not set, skipping standalone tests"
152+
fi
153+
154+
- name: Upload standalone coverage to Codecov
155+
if: always()
156+
uses: codecov/codecov-action@v5
157+
with:
158+
token: ${{ secrets.CODECOV_TOKEN }}
159+
files: ./coverage-standalone.xml
160+
flags: standalone
161+
name: codecov-standalone
162+
fail_ci_if_error: false
163+
114164
integration-tests:
115165
name: Integration Tests (Real LLM)
116166
runs-on: ubuntu-latest
@@ -163,8 +213,8 @@ jobs:
163213
all-tests:
164214
name: All Tests Summary
165215
runs-on: ubuntu-latest
166-
needs: [lint, unit-tests, check-changes, integration-tests]
167-
# Run even if integration-tests was skipped (on PRs or no src changes)
216+
needs: [lint, unit-tests, check-changes, standalone-tests, integration-tests]
217+
# Run even if integration-tests or standalone-tests was skipped
168218
if: always()
169219

170220
steps:
@@ -180,6 +230,13 @@ jobs:
180230
echo "Unit tests failed"
181231
exit 1
182232
233+
- name: Check standalone tests result
234+
# Only fail if standalone tests ran and failed (not if skipped)
235+
if: needs.standalone-tests.result == 'failure'
236+
run: |
237+
echo "Standalone tests failed"
238+
exit 1
239+
183240
- name: Check integration tests result
184241
# Only fail if integration tests ran and failed (not if skipped)
185242
if: needs.integration-tests.result == 'failure'
@@ -192,5 +249,6 @@ jobs:
192249
echo "All required checks passed!"
193250
echo "Lint: ${{ needs.lint.result }}"
194251
echo "Unit tests: ${{ needs.unit-tests.result }}"
252+
echo "Standalone tests: ${{ needs.standalone-tests.result }}"
195253
echo "Integration tests: ${{ needs.integration-tests.result }}"
196254
echo "Integration needed: ${{ needs.check-changes.outputs.integration_needed }}"

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,7 @@ coverage-*.xml
8585

8686
# UV package manager
8787
uv.lock
88+
89+
# Local development tools (tool-specific artifacts, not project files)
90+
.hedit/
91+
.serena/cache/

.serena/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/cache
-129 KB
Binary file not shown.
-21.8 KB
Binary file not shown.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Code Style and Conventions
2+
3+
## Python Style
4+
- **Formatter**: ruff format (Black-compatible, 88 char line)
5+
- **Linter**: ruff check with --fix --unsafe-fixes
6+
- **Type checker**: ty
7+
- **Imports**: Sorted via isort (ruff)
8+
- **Python**: 3.12+ features (type unions with `|`, etc.)
9+
10+
## Naming Conventions
11+
- **Classes**: PascalCase (e.g., `HedAnnotationWorkflow`, `ValidationResult`)
12+
- **Functions/methods**: snake_case (e.g., `get_complete_system_prompt`)
13+
- **Constants**: UPPER_SNAKE_CASE (e.g., `HED_SYNTAX_RULES`)
14+
- **Files**: snake_case (e.g., `hed_validator.py`)
15+
- **Private methods**: Leading underscore (e.g., `_build_graph`, `_route_after_validation`)
16+
17+
## Type Hints
18+
- Required for all public functions and methods
19+
- Use modern syntax: `list[str]`, `dict[str, Any]`, `str | None`
20+
- TypedDict for state objects
21+
- Literal types for constrained strings
22+
23+
## Docstrings
24+
- Google style
25+
- Required for classes and public methods
26+
- Include Args, Returns, Raises sections
27+
28+
## Patterns
29+
- **TypedDict** for state management (LangGraph)
30+
- **Dataclasses** for data structures (`ValidationIssue`, `ValidationResult`)
31+
- **Async/await** for API endpoints and LangGraph workflows
32+
- **Context managers** for resource management
33+
- **Pathlib** for file operations
34+
- **F-strings** for formatting
35+
- **Logging** via `logging.getLogger(__name__)`
36+
37+
## Commit Messages
38+
- Format: `<type>: <description>` (<50 chars)
39+
- Types: feat, fix, docs, refactor, test, chore
40+
- No emojis, no co-author mentions
41+
- Atomic commits (one logical change each)
42+
43+
## Testing
44+
- NO MOCKS policy (real data, real API calls)
45+
- Use `OPENROUTER_API_KEY_FOR_TESTING` for integration tests
46+
- Mark integration tests with `@pytest.mark.integration`
47+
- Coverage tracking via codecov
48+
49+
## Pre-commit Hooks
50+
- Ruff check with --fix --unsafe-fixes on staged files only
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Codebase Structure
2+
3+
```
4+
hedit/
5+
├── src/
6+
│ ├── agents/ # LangGraph multi-agent system
7+
│ │ ├── workflow.py # HedAnnotationWorkflow (orchestration)
8+
│ │ ├── state.py # HedAnnotationState (TypedDict)
9+
│ │ ├── annotation_agent.py # Generates HED tags from text
10+
│ │ ├── validation_agent.py # Validates HED compliance
11+
│ │ ├── evaluation_agent.py # Assesses faithfulness
12+
│ │ ├── assessment_agent.py # Final completeness check
13+
│ │ ├── feedback_summarizer.py # Condenses feedback for LLM
14+
│ │ ├── vision_agent.py # Image-to-text via vision LLMs
15+
│ │ └── feedback_triage_agent.py # User feedback processing
16+
│ ├── validation/
17+
│ │ ├── hed_validator.py # Dual JS+Python validators
18+
│ │ └── hed_lsp.py # HED Language Server Protocol
19+
│ ├── utils/
20+
│ │ ├── hed_rules.py # HED syntax/semantic rules (system prompts)
21+
│ │ ├── hed_comprehensive_guide.py # Full LLM-optimized HED guide
22+
│ │ ├── json_schema_loader.py # JSON schema + vocabulary extraction
23+
│ │ ├── schema_loader.py # Legacy Python schema loader
24+
│ │ ├── error_remediation.py # Error augmentation for LLM feedback
25+
│ │ ├── openrouter_llm.py # OpenRouter API integration
26+
│ │ ├── litellm_llm.py # Alternative LLM providers
27+
│ │ ├── image_processing.py # Base64 image encoding
28+
│ │ └── github_client.py # GitHub API for feedback issues
29+
│ ├── api/
30+
│ │ ├── main.py # FastAPI app (endpoints)
31+
│ │ ├── models.py # Pydantic request/response models
32+
│ │ └── security.py # Authentication
33+
│ ├── cli/
34+
│ │ ├── main.py # Typer CLI entry point
35+
│ │ ├── config.py # Config management (YAML)
36+
│ │ ├── client.py # API client
37+
│ │ ├── executor.py # Execution strategy
38+
│ │ ├── local_executor.py # Local workflow execution
39+
│ │ ├── api_executor.py # Remote API execution
40+
│ │ ├── output.py # Output formatting
41+
│ │ └── commands/ # CLI subcommands
42+
│ ├── telemetry/
43+
│ │ ├── collector.py # Data collection
44+
│ │ ├── storage.py # Local + Cloudflare KV
45+
│ │ └── schema.py # Telemetry data schema
46+
│ ├── scripts/
47+
│ │ └── process_feedback.py # Feedback processing
48+
│ └── version.py # Version info
49+
├── tests/ # pytest test suite (22 files)
50+
├── docs/ # Documentation
51+
├── frontend/ # Web UI (Cloudflare Pages)
52+
├── workers/ # Cloudflare Workers proxy
53+
├── deploy/ # Deployment configs
54+
├── docker/ # Docker configs
55+
├── scripts/ # Utility scripts (bump_version.py)
56+
├── .context/ # Context files for AI agents
57+
├── .rules/ # Development rules
58+
├── pyproject.toml # Project config + dependencies
59+
└── CLAUDE.md # AI agent instructions
60+
```
61+
62+
## Key Entry Points
63+
- **API**: `src/api/main.py` (FastAPI app)
64+
- **CLI**: `src/cli/main.py` (Typer CLI, registered as `hedit` command)
65+
- **Workflow**: `src/agents/workflow.py` (LangGraph state graph)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# HEDit Project Overview
2+
3+
## Purpose
4+
HEDit is a multi-agent system for converting natural language event descriptions into valid HED (Hierarchical Event Descriptors) annotation strings. Part of the Annotation Garden Initiative (AGI).
5+
6+
## Tech Stack
7+
- **Language**: Python 3.12+
8+
- **Package Manager**: uv
9+
- **Agent Framework**: LangGraph
10+
- **LLM Provider**: OpenRouter API (production), Ollama (fallback)
11+
- **Validation**: HED JavaScript validator + HED Python tools (hedtools)
12+
- **Backend**: FastAPI
13+
- **CLI**: Typer + Rich
14+
- **Frontend**: Cloudflare Pages
15+
- **Workers**: Cloudflare Workers (API proxy)
16+
- **API Hosting**: api.annotation.garden (SCCN VM via Apache reverse proxy)
17+
18+
## Current Version
19+
- `0.7.0.dev1` on develop branch
20+
- `0.6.8a2` on main branch
21+
22+
## Key Architecture
23+
- Multi-agent workflow: annotate -> validate -> evaluate -> assess
24+
- Iterative refinement with validation feedback loops
25+
- Dual validation: JavaScript (preferred) + Python (fallback)
26+
- State machine: `HedAnnotationState(TypedDict)` in `src/agents/state.py`
27+
- Workflow orchestration: LangGraph StateGraph in `src/agents/workflow.py`
28+
29+
## Branching
30+
- **main**: Production releases (alpha/beta/stable)
31+
- **develop**: Active development (dev releases)
32+
- Feature branches from develop, PRs target develop

0 commit comments

Comments
 (0)