Skip to content

Commit 1935a7a

Browse files
larrybabbclaude
andcommitted
docs: add comprehensive README
Add detailed documentation covering: - Project overview and features - Installation methods (pip, development, Docker) - Configuration via environment variables - Docker Compose setup and available configurations - REST API usage with examples for all endpoints - CLI usage for VCF ingestion - Project structure overview - Development setup, testing, and code quality - Makefile command reference - Contributing guidelines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent c8dc145 commit 1935a7a

File tree

1 file changed

+324
-0
lines changed

1 file changed

+324
-0
lines changed

README.md

Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,325 @@
11
# AnyVLM
2+
3+
[![Python Package CI](https://github.com/genomicmedlab/anyvlm/actions/workflows/python-package.yaml/badge.svg)](https://github.com/genomicmedlab/anyvlm/actions/workflows/python-package.yaml)
4+
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5+
6+
**AnyVLM** (Any Variant-Level Matching) is an off-the-shelf solution for adding local aggregate-level variant information to a Variant-Level Matching (VLM) network. It provides a REST API service that integrates with GA4GH standards for genomic data exchange.
7+
8+
## Overview
9+
10+
AnyVLM enables genomic research organizations to:
11+
12+
- **Ingest VCF files** containing variant and allele frequency data
13+
- **Register variants** using the GA4GH Variant Representation Specification (VRS) via AnyVar
14+
- **Store cohort allele frequencies** (CAF) with zygosity-stratified counts
15+
- **Serve VLM protocol-compliant responses** with Beacon handover capabilities
16+
17+
This service is designed for rare disease variant frequency tracking in genomic research networks such as GREGoR.
18+
19+
## Features
20+
21+
- **VCF File Ingestion**: Streaming upload with comprehensive validation, batch processing, and support for multiple alternate alleles
22+
- **VRS Compliance**: Integration with AnyVar for standardized variant representation
23+
- **Zygosity Tracking**: Separate counts for homozygous, heterozygous, and hemizygous variants
24+
- **GA4GH Beacon v2 Compatible**: Standards-compliant responses for network interoperability
25+
- **Flexible Deployment**: Docker support, configurable storage backends, and CLI tools
26+
- **Assembly Support**: Both GRCh37/hg19 and GRCh38/hg38 reference assemblies
27+
28+
## Requirements
29+
30+
- Python 3.11 - 3.14
31+
- PostgreSQL 17+
32+
- [AnyVar](https://github.com/biocommons/anyvar) for variant registration
33+
- [SeqRepo](https://github.com/biocommons/biocommons.seqrepo) for sequence data
34+
- [UTA](https://github.com/biocommons/uta) for transcript alignment
35+
36+
## Installation
37+
38+
### Via pip
39+
40+
```bash
41+
pip install anyvlm
42+
```
43+
44+
### Development Installation
45+
46+
```bash
47+
git clone https://github.com/genomicmedlab/anyvlm.git
48+
cd anyvlm
49+
pip install -e ".[dev,test]"
50+
```
51+
52+
### Docker
53+
54+
```bash
55+
docker pull ghcr.io/genomicmedlab/anyvlm:latest
56+
```
57+
58+
## Configuration
59+
60+
AnyVLM is configured via environment variables. Create a `.env` file in your project root:
61+
62+
```bash
63+
# Required: Database connection
64+
ANYVLM_STORAGE_URI=postgresql://anyvlm:anyvlm-pw@localhost:5435/anyvlm
65+
66+
# Required for /variant_counts endpoint: VLM handover configuration
67+
HANDOVER_TYPE_ID="GREGoR-NCH"
68+
HANDOVER_TYPE_LABEL="GREGoR AnyVLM Reference"
69+
BEACON_HANDOVER_URL="https://variants.example.org/"
70+
BEACON_NODE_ID="org.anyvlm.example"
71+
72+
# AnyVar configuration
73+
UTA_DB_URL=postgresql://anonymous@localhost:5432/uta/uta_20241220
74+
SEQREPO_DATAPROXY_URI=seqrepo+file:///usr/local/share/seqrepo/2024-12-20
75+
ANYVAR_STORAGE_URI=postgresql://anyvar:anyvar-pw@localhost:5434/anyvar
76+
77+
# Optional: Service configuration
78+
ANYVLM_ENV=local # local, test, dev, staging, prod
79+
ANYVLM_SERVICE_URI=http://localhost:8080
80+
ANYVLM_ANYVAR_URI=http://localhost:8000 # Omit to use embedded Python client
81+
82+
# Optional: Custom logging configuration
83+
ANYVLM_LOGGING_CONFIG=/path/to/logging.yaml
84+
```
85+
86+
See [`.env.example`](.env.example) for a complete template.
87+
88+
## Quick Start
89+
90+
### Using Docker Compose (Recommended)
91+
92+
1. **Create required volumes:**
93+
94+
```bash
95+
make volumes
96+
```
97+
98+
2. **Start the full stack:**
99+
100+
```bash
101+
# Development mode with hot-reload
102+
make up-dev
103+
104+
# Or production mode
105+
ANYVLM_VERSION=latest make up
106+
```
107+
108+
3. **Access the service:**
109+
- AnyVLM API: <http://localhost:8080>
110+
- API Documentation: <http://localhost:8080/docs>
111+
- AnyVar (if using compose.anyvar.yaml): <http://localhost:8000>
112+
113+
### Available Docker Compose Configurations
114+
115+
| File | Purpose |
116+
| --------------------- | ---------------------------------------------------- |
117+
| `compose.yaml` | Production deployment with pre-built images |
118+
| `compose.dev.yaml` | Development with local build and hot-reload |
119+
| `compose.anyvar.yaml` | AnyVar dependencies (SeqRepo, UTA, AnyVar service) |
120+
| `compose.test.yaml` | Minimal services for testing |
121+
122+
**Full stack with AnyVar:**
123+
124+
```bash
125+
docker compose -f compose.dev.yaml -f compose.anyvar.yaml up --build
126+
```
127+
128+
## Usage
129+
130+
### REST API
131+
132+
#### Service Info
133+
134+
```bash
135+
curl http://localhost:8080/service-info
136+
```
137+
138+
Returns GA4GH-compliant service metadata.
139+
140+
#### Ingest VCF File
141+
142+
```bash
143+
curl -X POST "http://localhost:8080/ingest_vcf?assembly=grch38" \
144+
-F "file=@/path/to/variants.vcf.gz"
145+
```
146+
147+
**Requirements:**
148+
149+
- File must be gzip-compressed (`.vcf.gz`)
150+
- Maximum file size: 5GB
151+
- Required INFO fields: `AC`, `AN`, `AC_Het`, `AC_Hom`, `AC_Hemi`
152+
153+
**Response:**
154+
155+
```json
156+
{
157+
"status": "success",
158+
"message": "Successfully ingested variants.vcf.gz",
159+
"details": null
160+
}
161+
```
162+
163+
#### Query Variant Counts
164+
165+
```bash
166+
curl "http://localhost:8080/variant_counts?assemblyId=GRCh38&referenceName=22&start=44389414&referenceBases=A&alternateBases=G"
167+
```
168+
169+
**Parameters:**
170+
171+
| Parameter | Description | Example |
172+
| ----------------- | ------------------------ | ---------------------------------- |
173+
| `assemblyId` | Reference assembly | `GRCh37`, `GRCh38`, `hg19`, `hg38` |
174+
| `referenceName` | Chromosome | `1-22`, `X`, `Y`, `MT` |
175+
| `start` | Position (1-based) | `44389414` |
176+
| `referenceBases` | Reference allele | `A`, `ACGT`, etc. |
177+
| `alternateBases` | Alternate allele | `G`, `TGCA`, etc. |
178+
179+
**Response:**
180+
181+
VLM protocol-compliant JSON with:
182+
183+
- `beaconHandovers`: Handover metadata for network integration
184+
- `meta`: Beacon metadata
185+
- `responseSummary`: Whether variant exists and total results
186+
- `response`: ResultSets grouped by zygosity (Homozygous, Heterozygous, Hemizygous, Unknown)
187+
188+
### Command-Line Interface
189+
190+
```bash
191+
# Ingest a VCF file
192+
anyvlm ingest-vcf --file /path/to/variants.vcf.gz --assembly grch38
193+
```
194+
195+
The CLI sends VCF data to the running AnyVLM service endpoint.
196+
197+
## Project Structure
198+
199+
```text
200+
anyvlm/
201+
├── src/anyvlm/
202+
│ ├── main.py # FastAPI application
203+
│ ├── cli.py # Command-line interface
204+
│ ├── config.py # Configuration management
205+
│ ├── restapi/ # REST API routes
206+
│ │ ├── vlm.py # VLM protocol endpoints
207+
│ │ └── deps.py # Dependency injection
208+
│ ├── functions/ # Core business logic
209+
│ │ ├── ingest_vcf.py # VCF processing
210+
│ │ ├── get_caf.py # CAF retrieval
211+
│ │ └── build_vlm_response.py
212+
│ ├── storage/ # Database layer
213+
│ │ ├── postgres.py # PostgreSQL implementation
214+
│ │ └── orm.py # SQLAlchemy models
215+
│ ├── anyvar/ # AnyVar integration
216+
│ │ ├── http_client.py # HTTP-based client
217+
│ │ └── python_client.py # Embedded Python client
218+
│ └── schemas/ # Pydantic data models
219+
├── tests/
220+
│ ├── unit/ # Unit tests
221+
│ └── integration/ # Integration tests
222+
└── docs/ # Sphinx documentation
223+
```
224+
225+
## Development
226+
227+
### Setup
228+
229+
```bash
230+
# Clone repository
231+
git clone https://github.com/genomicmedlab/anyvlm.git
232+
cd anyvlm
233+
234+
# Install development dependencies
235+
pip install -e ".[dev,test]"
236+
237+
# Install pre-commit hooks
238+
pre-commit install
239+
```
240+
241+
### Running Tests
242+
243+
```bash
244+
# Run all tests
245+
make test
246+
247+
# Run with coverage
248+
pytest --cov=anyvlm --cov-report=term-missing
249+
250+
# Run specific test file
251+
pytest tests/unit/test_restapi.py
252+
253+
# Start test database services
254+
docker compose -f compose.test.yaml up -d
255+
```
256+
257+
### Code Quality
258+
259+
```bash
260+
# Format code
261+
ruff format src tests
262+
263+
# Lint code
264+
ruff check src tests
265+
266+
# Run all pre-commit hooks
267+
pre-commit run --all-files
268+
```
269+
270+
### Building Documentation
271+
272+
```bash
273+
make -C docs html
274+
# Output: docs/_build/html/index.html
275+
```
276+
277+
## Makefile Commands
278+
279+
| Command | Description |
280+
| ---------------- | ---------------------------------------- |
281+
| `make develop` | Install package in development mode |
282+
| `make test` | Run test suite |
283+
| `make volumes` | Create required Docker volumes |
284+
| `make up` | Start production stack |
285+
| `make up-dev` | Start development stack with hot-reload |
286+
| `make up-test` | Start test services |
287+
| `make down` | Remove all containers |
288+
| `make stop` | Stop running services |
289+
290+
## API Documentation
291+
292+
When the service is running, interactive API documentation is available at:
293+
294+
- **Swagger UI**: <http://localhost:8080/docs>
295+
- **ReDoc**: <http://localhost:8080/redoc>
296+
297+
## Contributing
298+
299+
1. Fork the repository
300+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
301+
3. Make your changes
302+
4. Run tests and linting (`make test && ruff check src tests`)
303+
5. Commit your changes (`git commit -m 'Add amazing feature'`)
304+
6. Push to the branch (`git push origin feature/amazing-feature`)
305+
7. Open a Pull Request
306+
307+
## License
308+
309+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
310+
311+
## Contact
312+
313+
- **Repository**: <https://github.com/genomicmedlab/anyvlm>
314+
- **Issues**: <https://github.com/genomicmedlab/anyvlm/issues>
315+
- **Email**: <biocommons-dev@googlegroups.com>
316+
317+
## Acknowledgments
318+
319+
AnyVLM is developed by [The Wagner Lab at Nationwide Children's](https://www.nationwidechildrens.org/specialties/institute-for-genomic-medicine/research-labs/wagner-lab) and [The Translational Genomics Group at Broad Institute](https://the-tgg.org/).
320+
321+
This project integrates with:
322+
323+
- [GA4GH VRS](https://vrs.ga4gh.org/) - Variant Representation Specification
324+
- [AnyVar](https://github.com/biocommons/anyvar) - Variant annotation service
325+
- [GA4GH Beacon](https://beacon-project.io/) - Standards for genomic data discovery

0 commit comments

Comments
 (0)