AGR AI Curation System

An AI-powered curation assistant for the Alliance of Genome Resources, helping biocurators extract and validate biological data from research papers.

Features

Config-Driven Agents - Shipped agents live in package-owned YAML, not hardcoded Python files. Standard installs customize via ~/.agr_ai_curation/runtime/packages/* and ~/.agr_ai_curation/runtime/config/*.
Agent Studio - Browse agents, inspect prompts, discuss behavior with Claude, and submit improvement suggestions.
Agent Workshop - Clone any agent, customize its prompt, select a model, attach tools, and test it against live documents -- all without writing code.
Visual Workflow Builder - Create reusable curation flows by chaining agents together in a drag-and-drop interface.
Multi-Provider LLM Support - Pluggable provider system (config/providers.yaml) supporting OpenAI, Gemini, and Groq out of the box. Models are declared in config/models.yaml.
PDF Processing - Upload research papers and extract structured data with AI assistance.
Batch Processing - Process multiple documents through saved workflows.
Real-time Audit Trail - Full transparency into AI decisions, database queries, and tool calls.
Tool Policy System - Centralized YAML-based policies (config/tool_policy_defaults.yaml) governing which tools are available to curators and agents.

Quick Start

Prerequisites

Docker and Docker Compose
OpenAI API key (for embeddings and GPT models)
Optional: Anthropic API key (for Claude in Agent Studio chat)
Optional: Groq API key, Gemini API key (additional LLM providers)

Standalone Install

For the published modular runtime, use the installer instead of editing the repository in place:

Get the installer

git clone https://github.com/alliance-genome/agr_ai_curation.git
cd agr_ai_curation

Run the standalone installer
```
scripts/install/install.sh
```
Stage 2 prompts Package profile [1=core only, 2=core + alliance] and defaults to core only. That seeds agr.core (Alliance Core) only, and a core-only install is expected to start healthy. To pin a published release, pass --image-tag vX.Y.Z.

To add agr.alliance (Alliance Defaults) later, re-run Stage 2:
```
scripts/install/install.sh --from-stage 2 --package-profile core-plus-alliance
```
Review the installed runtime
- Secrets and image tags: ~/.agr_ai_curation/.env
- Selected package profile: ~/.agr_ai_curation/.install_package_profile.env
- Runtime config: ~/.agr_ai_curation/runtime/config/
- Shipped and custom packages: ~/.agr_ai_curation/runtime/packages/
- Mutable data: ~/.agr_ai_curation/data/
Access the application
- Frontend: http://localhost:3002
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

See Modular Packages and Upgrades for package authoring, override behavior, standard upgrades, and repo-install migration.

Source Development

For local product development in a repository checkout:

Configure environment

make setup
# Edit ~/.agr_ai_curation/.env with your API keys and settings

At minimum, set these values in ~/.agr_ai_curation/.env:

OPENAI_API_KEY=your_openai_api_key_here

Start the services
```
make dev-detached
```

Verify Installation

# Standalone install
docker compose --env-file ~/.agr_ai_curation/.env -f docker-compose.production.yml ps

# Source development
docker compose ps

# Shared health check
curl http://localhost:8000/health

For a generic standalone install, curation_db is optional and can remain not_configured. It should be treated as a later integration step, not a base install requirement.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│    Frontend     │────▶│    Backend      │────▶│   Weaviate      │
│   (React/MUI)   │     │   (FastAPI)     │     │ (Vector Store)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌─────────────────┐
                        │   PostgreSQL    │
                        │   (Metadata)    │
                        └─────────────────┘

Config-Driven Agent System

Agents are no longer hardcoded Python files. The default standalone install seeds agr.core (Alliance Core), which provides the minimal supervisor/startup contract. Installing agr.alliance (Alliance Defaults) restores the shipped specialist catalog and tool bindings, while standalone deployments customize behavior through additional packages under ~/.agr_ai_curation/runtime/packages/ and deployment YAML overrides under ~/.agr_ai_curation/runtime/config/.

~/.agr_ai_curation/
├── runtime/config/              # Deployment override YAML
│   ├── models.yaml
│   ├── providers.yaml
│   └── tool_policy_defaults.yaml
└── runtime/packages/
    ├── core/                    # `agr.core` (Alliance Core)
    ├── alliance/                # `agr.alliance` (Alliance Defaults, optional)
    └── org-custom/              # Your custom package(s)

Services

Service	Port	Description
frontend	3002	React web application
backend	8000	FastAPI server with config-driven AI agents
weaviate	8080	Vector database for document chunks
postgres	5432	Metadata storage (users, documents, flows, custom agents)
langfuse	3000	Observability and tracing (optional)
trace_review	3001/8001	Langfuse trace analysis UI (separate service)

Documentation

For Curators

Getting Started - First-time setup and basic usage
Best Practices - Tips for effective queries
Available Agents - All specialist agents
Curation Flows - Visual workflow builder
Batch Processing - Process multiple documents
Agent Studio - Browse prompts, Agent Workshop, and chat with Claude

For Developers

Harness Health - Current automation and validation health notes
Trace Review - Langfuse trace analysis tool for debugging agent behavior

Deployment

Independent Deployment - Standalone deployment guide
Modular Packages and Upgrades - Runtime layout, package model, overrides, and upgrade paths
LLM Provider Rollout Runbook - Adding new LLM providers
LLM Provider Smoke Test Matrix - Provider validation checklist

Development

Running Tests

# Run all healthy tests
docker compose exec backend pytest tests/unit/ -v

# Run specific test file
docker compose exec backend pytest tests/unit/test_config.py -v

# Run with coverage
docker compose exec backend pytest tests/unit/ --cov=src --cov-report=html

Code Quality

# Lint with ruff
docker compose exec backend ruff check .

# Format code
docker compose exec backend ruff format .

Database Migrations

# Create a new migration
docker compose exec backend alembic revision --autogenerate -m "description"

# Apply migrations
docker compose exec backend alembic upgrade head

Adding a New Agent

For a standalone/public install, add agents through a custom runtime package under ~/.agr_ai_curation/runtime/packages/<your-package>/agents/ rather than editing the repo-local config/agents/ directory directly. See config/agents/README.md and Modular Packages and Upgrades for the package contract.

If you are developing the built-in catalog from a repository checkout, config/agents/supervisor/ mirrors the shipped agr.core supervisor bundle and the remaining config/agents/<name>/ folders mirror the shipped agr.alliance specialist bundles.

Configuration

Environment Variables

All runtime configuration is done through environment variables. See .env.example for available options.

Section	Variables	Description
API Keys	`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GROQ_API_KEY`, `GEMINI_API_KEY`	LLM provider credentials
Database	`DATABASE_URL`	PostgreSQL connection
Weaviate	`WEAVIATE_HOST`, `WEAVIATE_PORT`	Vector store connection
LLM Settings	`DEFAULT_AGENT_MODEL`, `DEFAULT_AGENT_REASONING`	Global model defaults
Auth	`COGNITO_*`	AWS Cognito authentication

YAML Configuration Files

File	Description
`~/.agr_ai_curation/runtime/packages//agents//agent.yaml`	Package-owned agent definitions for standalone installs
`~/.agr_ai_curation/runtime/config/models.yaml`	Deployment model overrides
`~/.agr_ai_curation/runtime/config/providers.yaml`	Deployment provider overrides
`~/.agr_ai_curation/runtime/config/tool_policy_defaults.yaml`	Deployment tool policy overrides

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Alliance of Genome Resources
OpenAI for GPT models
Anthropic for Claude
Groq for high-throughput inference
Weaviate for vector storage

Name		Name	Last commit message	Last commit date
Latest commit History 471 Commits
.github/workflows		.github/workflows
alliance_agents		alliance_agents
alliance_config		alliance_config
backend		backend
config		config
docs		docs
frontend		frontend
maintenance_page		maintenance_page
packages		packages
scripts		scripts
trace_review		trace_review
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.gitleaksignore		.gitleaksignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.maintenance.yml		docker-compose.maintenance.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.production.yml		docker-compose.production.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
loki-config.yml		loki-config.yml
promtail-config.yml		promtail-config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AGR AI Curation System

Features

Quick Start

Prerequisites

Standalone Install

Source Development

Verify Installation

Architecture

Config-Driven Agent System

Services

Documentation

For Curators

For Developers

Deployment

Development

Running Tests

Code Quality

Database Migrations

Adding a New Agent

Configuration

Environment Variables

YAML Configuration Files

Contributing

License

Acknowledgments

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AGR AI Curation System

Features

Quick Start

Prerequisites

Standalone Install

Source Development

Verify Installation

Architecture

Config-Driven Agent System

Services

Documentation

For Curators

For Developers

Deployment

Development

Running Tests

Code Quality

Database Migrations

Adding a New Agent

Configuration

Environment Variables

YAML Configuration Files

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages