Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
164 commits
Select commit Hold shift + click to select a range
5667053
Add primary field to score range models and implement validation for …
bencap Oct 6, 2025
c11a1e7
Add Fayer score range models and integrate into score set ranges
bencap Oct 6, 2025
e3192cd
fix: score range admin model inheritance
bencap Oct 8, 2025
0ef80a9
feat: add standalone score calibration model
bencap Oct 8, 2025
61aa6ef
Add build path to docker-compose-dev.yaml for dcd-mapping and cdot-re…
davereinhart Sep 25, 2025
73b3a1a
Update redis environment variables in template file
davereinhart Sep 25, 2025
12d6c0e
Update value check on score_set and experiment updates to handle non-…
davereinhart Sep 29, 2025
b24f3c0
Modify the function of get_score_set_variants_csv to allow downloadin…
EstelleDa Oct 17, 2025
4a8f65a
Modify a related test.
EstelleDa Oct 17, 2025
c6aab8f
Add some related tests.
EstelleDa Oct 20, 2025
9e69143
feat: add dataset_columns support in score set updates
davereinhart Sep 30, 2025
9d599d0
feat: process and validate dataset column metadata for scores and cou…
davereinhart Oct 3, 2025
f28c8e7
feat: add ScoreSetUpdateAllOptional with multipart form helper
davereinhart Oct 7, 2025
5cb60e0
refactor: move dataset column pydantic models to dedicated module
davereinhart Oct 7, 2025
bbcb3f2
refactor: replace dynamic camelization test with explicit model
davereinhart Oct 8, 2025
454cf86
feat: extend SavedDatasetColumns with recordType
davereinhart Oct 8, 2025
e3812b1
feat: add PATCH endpoint supporting variants + score/count metadata u…
davereinhart Oct 9, 2025
45c195d
feat: add target gene find_or_create helpers for sequence/accession
davereinhart Oct 11, 2025
de87fc4
refactor: unify variable names for score/count metadata & extend routes
davereinhart Oct 13, 2025
7fe4ed8
feat: enhance PATCH endpoint to fully process uploaded files
davereinhart Oct 14, 2025
cf89e6e
test: update and add unit tests aligned with new score set update flow
davereinhart Oct 17, 2025
d49fb27
test: add ScoreSetUpdateAllOptional model tests and router validation…
davereinhart Oct 17, 2025
4fa4995
feat: add worker job fixtures for score/count column metadata
davereinhart Oct 17, 2025
590844e
feat: add AlphaFold version proxy endpoint
davereinhart Oct 20, 2025
e427084
Add gzip middleware to fastapi to compress large responses
davereinhart Oct 22, 2025
8f6b3f0
Add score set search parameter to optionally skip including experimen…
davereinhart Oct 22, 2025
eb591af
Apply limit to score set searches.
jstone-dev Sep 30, 2025
0c1544f
Add an endpoint to obtain search filter options based on a given scor…
jstone-dev Sep 30, 2025
041bfb9
Allow score set search without a row limit when publication IDs are s…
jstone-dev Sep 30, 2025
1e5e6ed
MyPy: typing for counters
jstone-dev Sep 30, 2025
a3a1372
Update unit tests to reflect score set search endpoint change.
jstone-dev Sep 30, 2025
b6b3720
Code formatting
jstone-dev Sep 30, 2025
6331d55
Unit test fixes
jstone-dev Sep 30, 2025
fd6e701
Format & test fixes
jstone-dev Sep 30, 2025
9bcb0cf
Test bug fixes
jstone-dev Sep 30, 2025
ea62984
Unit test fixes
jstone-dev Oct 1, 2025
765f02f
Test bug fix
jstone-dev Oct 1, 2025
0ef53bf
Refactor counter usage for score set search filters.
jstone-dev Oct 24, 2025
6a67c84
Log the total number of matching score sets rather than the number re…
jstone-dev Oct 24, 2025
52cb863
Add an offset parameter to support full pagination of score set searc…
jstone-dev Oct 24, 2025
b9ff02b
Move score set search limits into constants.
jstone-dev Oct 24, 2025
db53a0e
Supply a default search limit.
jstone-dev Oct 24, 2025
3be7945
Unit tests for new score set search errors.
jstone-dev Oct 24, 2025
0508a4f
Don't import from router in test_score_set.py.
jstone-dev Oct 24, 2025
be3d522
Linting fix
jstone-dev Oct 24, 2025
ebde590
Return correct result count in paginated results with offset.
jstone-dev Oct 24, 2025
3330c2d
Merge pull request #547 from VariantEffect/davereinhart/alphafold-fil…
davereinhart Oct 24, 2025
b57c11f
Apply ruff format and organize import on files in this branch
davereinhart Oct 27, 2025
a150ebd
Merge pull request #548 from VariantEffect/davereinhart/router-perfor…
davereinhart Oct 27, 2025
8391f67
Cleanup
davereinhart Oct 27, 2025
5077533
Move all_fields_optional_model decorator to view models utils module …
davereinhart Oct 27, 2025
2950b2e
Add unit tests for all_fields_optional_model decorator
davereinhart Oct 27, 2025
ea3d2a7
Standardize router tags for clarity and consistency
bencap Oct 28, 2025
18fcf17
fixed: AccessKey object using `created_at` when property was `creatio…
bencap Oct 28, 2025
96149bd
Modified and debug the codes and some related tests.
EstelleDa Oct 29, 2025
cab9fe5
Updates to score set endpoints to receieve score and count columns me…
davereinhart Oct 29, 2025
b3a5a88
Update unit tests to include score and count columns metadata fields …
davereinhart Oct 29, 2025
38d974e
Merge branch 'release-2025.5.0' into jstone-dev/score-set-search-resu…
jstone-dev Oct 29, 2025
e0abfe0
Fix after merge
jstone-dev Oct 29, 2025
0cf96b2
Enhance API Documentation: Add explicit summaries for access key rout…
bencap Oct 28, 2025
92236a2
fixed: raise appropriate 404 error when access key is not found durin…
bencap Oct 28, 2025
2fbae48
Enhance API Documentation: Add explicit summary for api version endpoint
bencap Oct 28, 2025
2b6b57a
Enhance API documentation: Add summaries and response status descript…
bencap Oct 28, 2025
aedbfd1
Enhance API Documentation: Add summaries to controlled keyword routes
bencap Oct 28, 2025
7864fcd
Enhance API Documentation: Improve summary and response descriptions …
bencap Oct 28, 2025
aa4ba52
fixed: permissions were not properly enforced on experiment set fetches
bencap Oct 28, 2025
ac41cdf
Enhance API Documentation: Add missing response descriptions and summ…
bencap Oct 28, 2025
20bdddb
fixed: Use permission module to check experiments in list router
bencap Oct 28, 2025
3783ad2
fixed: Simplify if/else logic of get experiment sets router
bencap Oct 28, 2025
357cb24
Enhance API responses and documentation: Add missing response descrip…
bencap Oct 28, 2025
091f3b0
Enhance API documentation: Add summaries and improve descriptions for…
bencap Oct 28, 2025
3cbaeea
Enhance API documentation: Improve summaries for licenses endpoints
bencap Oct 28, 2025
956dd27
Enhance API documentation: Add internal server error response descrip…
bencap Oct 28, 2025
f615f60
Enhance API documentation: Add response descriptions and summaries fo…
bencap Oct 28, 2025
ac6ee3e
Enhance API documentation: Add internal server error response and imp…
bencap Oct 28, 2025
74e8d10
Enhance API documentation: Update responses and summary for permissio…
bencap Oct 28, 2025
2ed520e
Enhance API documentation: Improve summaries and response description…
bencap Oct 28, 2025
abed834
Enhance API documentation: Update response descriptions and error han…
bencap Oct 28, 2025
566bf90
Enhance API documentation: Add summaries for service info and metadat…
bencap Oct 28, 2025
d9e3b3d
Enhance API documentation: Add summaries and response descriptions fo…
bencap Oct 28, 2025
9b46446
Enhance API documentation: Add summaries for sequence, metadata, and …
bencap Oct 28, 2025
b9a5090
Enhance API documentation: Add summaries for various statistics endpo…
bencap Oct 28, 2025
f418f9e
Enhance API Documentation: fix response status code for missing searc…
bencap Oct 28, 2025
b61cff8
Enhance API documentation: Add summaries and response descriptions fo…
bencap Oct 28, 2025
c6549ae
Enhance API documentation: Add summaries for taxonomy endpoints
bencap Oct 28, 2025
ee2d725
Enhance API documentation: Add summaries and response descriptions fo…
bencap Oct 28, 2025
73de569
Enhance API documentation: Add summaries and response descriptions fo…
bencap Oct 28, 2025
d133e30
Clarify HTTP status codes for email and role authorization checks
bencap Oct 28, 2025
9f327f3
Enhance API metadata and tag consistency across routers for improved …
bencap Oct 30, 2025
3b90ab8
Enhance docs for mat view refresh methods and commit their transactions
bencap Oct 30, 2025
9193a2f
Add comprehensive instruction files for GitHub Copilot and project gu…
bencap Nov 3, 2025
b0a37e3
Remove platform specification from python-base image in Dockerfile
bencap Nov 3, 2025
dce5054
Alter markdown line length recommendation
bencap Nov 3, 2025
7f0688b
Return unenriched score sets when include_experiment_score_set_urns_a…
jstone-dev Nov 5, 2025
c6a2014
Merge pull request #525 from VariantEffect/jstone-dev/score-set-searc…
jstone-dev Nov 5, 2025
824bcac
Merge pull request #557 from VariantEffect/bugfix/bencap/556/mat-view…
bencap Nov 6, 2025
a0cec30
Merge pull request #559 from VariantEffect/feature/bencap/platform-bu…
bencap Nov 6, 2025
5784046
Bump deps with vulnerabilities to satisfy dependabot
bencap Nov 6, 2025
2f37d6b
Merge branch 'release-2025.5.0' into davereinhart/scoreset-column-met…
davereinhart Nov 6, 2025
b155ed6
Merge pull request #546 from VariantEffect/davereinhart/scoreset-colu…
davereinhart Nov 7, 2025
7d0fc6e
Add a boolean namespaced attribute and modify some related functions.
EstelleDa Nov 7, 2025
cb5717c
fixed: variant count endpoint was returning a count including non-dis…
bencap Nov 7, 2025
129d7da
feat: make calibration migration manual
bencap Nov 5, 2025
3da2891
feat: drop score range property from score set table
bencap Oct 8, 2025
7cafbac
feat: remove score_ranges column from ScoreSet model
bencap Oct 8, 2025
b5ba6d2
feat: add test constants for biorxiv publication, oddspaths, evidence…
bencap Oct 19, 2025
ef05b4c
feat: add ACMG classification models and associated tests
bencap Oct 19, 2025
bfb4b96
feat: add odds ratio classification function and corresponding tests
bencap Oct 19, 2025
646c5f7
feat: refactor score ranges into score calibrations.
bencap Oct 19, 2025
92a2a54
feat: remove 'name' field and add 'notes' field to score calibrations
bencap Oct 19, 2025
6197f86
feat: add score calibration model to /permissions router
bencap Oct 20, 2025
a1641ca
feat: filter score calibrations by user permissions in fetch_score_se…
bencap Oct 20, 2025
f3c8574
Adds various small patches to new score calibration model
bencap Oct 23, 2025
f406cf2
feat: allow 'not_specified' classifications to overlap in score calib…
bencap Nov 8, 2025
6f8d11b
fix: correct typo in StrengthOfEvidenceProvided enum value and update…
bencap Nov 9, 2025
a04ccb0
feat: add script to load calibration data from CSV into the database
bencap Nov 10, 2025
8c26bb1
feat: implement streaming endpoints for annotated variants with patho…
bencap Nov 10, 2025
9876327
Merge pull request #573 from VariantEffect/bugfix/bencap/511/statisti…
bencap Nov 10, 2025
5724fe3
Merge pull request #570 from VariantEffect/maintenance/bencap/bump-vu…
bencap Nov 10, 2025
1f3af16
Solved the merge conflicts and modified the related tests.
EstelleDa Nov 11, 2025
2ae432c
Merge pull request #541 from VariantEffect/enhancement/estelle/446/na…
EstelleDa Nov 11, 2025
313053d
feat: refactor pillar project calibration loader for use against simp…
bencap Nov 11, 2025
7eb4d02
feat: output VRS digest with post mapped HGVS strings
bencap Nov 11, 2025
3b616a3
Merge branch 'release-2025.5.0' of https://github.com/VariantEffect/m…
bencap Nov 11, 2025
675f134
fix: Update docstring to reflect correct score set property name
bencap Nov 11, 2025
9acd4a4
Merge pull request #545 from VariantEffect/feature/bencap/518/generic…
bencap Nov 11, 2025
d78189e
Merge pull request #577 from VariantEffect/feature/bencap/550/vrs-dig…
bencap Nov 11, 2025
29a1cbb
Merge branch 'release-2025.5.0' of https://github.com/VariantEffect/m…
bencap Nov 11, 2025
7b4a20c
Merge pull request #574 from VariantEffect/feature/bencap/560/optimiz…
bencap Nov 11, 2025
01e3db1
Merge branch 'release-2025.5.0' of https://github.com/VariantEffect/m…
bencap Nov 11, 2025
d117268
Merge pull request #555 from VariantEffect/maintenance/bencap/542/cle…
bencap Nov 11, 2025
47f58f7
Merge pull request #558 from VariantEffect/feature/bencap/copilot-onb…
bencap Nov 12, 2025
01321f1
Merge pull request #578 from VariantEffect/feature/bencap/refactored-…
bencap Nov 12, 2025
970ea81
Populate functional consequences for variants via script
sallybg Sep 18, 2025
2f86e1b
Batch initial requests to VEP
sallybg Sep 24, 2025
ee17f93
Batch requests to Variant Recoder
sallybg Sep 24, 2025
087035f
Run post-variant-recoder vep as a batch
sallybg Oct 27, 2025
4f8e0d8
Add VEP functional consequence to variants csv
sallybg Oct 27, 2025
83a5c67
Retrieve and store hgvs representations of mapped variants
sallybg Oct 21, 2025
fffde3d
Include post-mapped hgvs in variants data csv
sallybg Oct 21, 2025
ba7f99c
Fix get_hgvs_from_post_mapped imports
sallybg Oct 27, 2025
5a9c887
Assert type for mypy
sallybg Oct 27, 2025
00f4429
Update variants csv tests to reflect mapped hgvs changes
sallybg Oct 27, 2025
6bfaaca
Resolve alembic merge conflict
sallybg Oct 29, 2025
f88a72c
Use provided na_rep string to represent null values
sallybg Nov 6, 2025
e04d305
Use production clingen API
sallybg Nov 6, 2025
fc404d0
Update progress while populating mapped hgvs
sallybg Nov 12, 2025
41c223b
Update todo issue links
sallybg Nov 12, 2025
1403c06
Fix worker import for hgvs extraction
sallybg Nov 12, 2025
ae5ab83
Add vep to csv namespace options
sallybg Nov 13, 2025
4fdad00
Fix csv tests
sallybg Nov 13, 2025
5326b5d
fix: update author extraction logic in RxivPublication class
bencap Nov 13, 2025
268b5e9
feat: unbounded zeiberg ranges to infinity, inclusive boundary logic …
bencap Nov 13, 2025
750baa2
fix: update gene listing query to fetch distinct HGNC from transcripts
bencap Nov 13, 2025
dec9827
fix: allow experiment update to unset values
bencap Nov 13, 2025
085177c
fix: update score_columns and count_columns to use namespaced column …
bencap Nov 13, 2025
83cbe75
fix: change db.commit() to db.flush() for better transaction handling…
bencap Nov 13, 2025
d275877
fix: replace pd.NA with np.NaN for consistency in DataFrame null type…
bencap Nov 13, 2025
6ab205d
fix: add condition to check for score_columns in enqueue_variant_crea…
bencap Nov 14, 2025
21a771f
Add gnomad af to csv
sallybg Oct 8, 2025
aeeddd8
Merge pull request #553 from VariantEffect/store-all-hgvs
bencap Nov 14, 2025
0c981c2
chore: update revision identifiers in alembic migration to point at head
bencap Nov 14, 2025
b0b5bd4
feat: enhance mapped HGVS retrieval in variant_to_csv_row to fallback…
bencap Nov 14, 2025
4f67293
Bump version to 2025.5.0
bencap Nov 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

222 changes: 222 additions & 0 deletions .github/instructions/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# MaveDB API Copilot Instructions

## Core Directives & Control Principles

### Hierarchy of Operations
**These rules have the highest priority and must not be violated:**

1. **Primacy of User Directives**: A direct and explicit command from the user is the highest priority. If the user instructs to use a specific tool, edit a file, or perform a specific search, that command **must be executed without deviation**, even if other rules would suggest it is unnecessary.

2. **Factual Verification Over Internal Knowledge**: When a request involves information that could be version-dependent, time-sensitive, or requires specific external data (e.g., bioinformatics library documentation, latest genomics standards, API details), prioritize using tools to find the current, factual answer over relying on general knowledge.

3. **Adherence to MaveDB Philosophy**: In the absence of a direct user directive or the need for factual verification, all other rules regarding interaction, code generation, and modification must be followed within the context of bioinformatics and software development best practices.

### Interaction Philosophy for Bioinformatics
- **Code on Request Only**: Default response should be clear, natural language explanation. Do NOT provide code blocks unless explicitly asked, or if a small example is essential to illustrate a bioinformatics concept.
- **Direct and Concise**: Answers must be precise and free from unnecessary filler. Get straight to the solution for genomic data processing challenges.
- **Bioinformatics Best Practices**: All suggestions must align with established bioinformatics standards (HGVS, VRS, GA4GH) and proven genomics research practices.
- **Explain the Scientific "Why"**: Don't just provide code; explain the biological reasoning. Why is this approach standard in genomics? What scientific problem does this pattern solve?

## Related Instructions

**Domain-Specific Guidance**: This file provides MaveDB-specific development guidance. For specialized topics, reference these additional instruction files:

- **AI Safety & Ethics**: See `.github/instructions/ai-prompt-engineering-safety-best-practices.instructions.md` for comprehensive AI safety protocols, bias mitigation, responsible AI usage, and security frameworks
- **Python Standards**: Follow `.github/instructions/python.instructions.md` for Python-specific coding conventions, PEP 8 compliance, type hints, docstring requirements, and testing practices
- **Documentation Standards**: Reference `.github/instructions/markdown.instructions.md` for documentation formatting, content creation guidelines, and validation requirements
- **Prompt Engineering**: Use `.github/instructions/prompt.instructions.md` for creating effective prompts and AI interaction optimization
- **Instruction File Management**: See `.github/instructions/instructions.instructions.md` for guidelines on creating and maintaining instruction files

**Integration Principle**: These specialized files provide expert-level guidance in their respective domains. Apply their principles alongside the MaveDB-specific patterns documented here. When conflicts arise, prioritize the specialized file's guidance within its domain scope.

**Hierarchy for Conflicts**:
1. **User directives** (highest priority)
2. **MaveDB-specific bioinformatics patterns** (this file)
3. **Domain-specific specialized files** (python.instructions.md, etc.)
4. **General best practices** (lowest priority)

## Architecture Overview

MaveDB API is a bioinformatics database API for Multiplex Assays of Variant Effect (MAVE) datasets. The architecture follows these key patterns:

### Core Domain Model
- **Hierarchical URN system**: ExperimentSet (`urn:mavedb:00000001`) → Experiment (`00000001-a`) → ScoreSet (`00000001-a-1`) → Variant (`00000001-a-1` + # + variant number)
- **Temporary URNs** during development: `tmp:uuid` format, converted to permanent URNs on publication
- **Resource lifecycle**: Draft → Published (with background worker processing)

### Service Architecture
- **FastAPI application** (`src/mavedb/server_main.py`) with router-based endpoint organization
- **Background worker** using ARQ/Redis for async processing (mapping, publication, annotation)
- **Multi-container setup**: API server, worker, PostgreSQL, Redis, external services (cdot-rest, dcd-mapping, seqrepo)
- **External bioinformatics services**: HGVS data providers, SeqRepo for sequence data, VRS mapping for variant representation

## Development Patterns

### Database & Models
- **SQLAlchemy 2.0** with declarative models in `src/mavedb/models/`
- **Alembic migrations** with manual migrations in `alembic/manual_migrations/`
- **Association tables** for many-to-many relationships (contributors, publications, keywords)
- **Enum classes** for controlled vocabularies (UserRole, ProcessingState, MappingState)

### Key Dependencies & Injections
```python
# Database session
def get_db() -> Generator[Session, Any, None]

# Worker queue
async def get_worker() -> AsyncGenerator[ArqRedis, Any]

# External data providers
def hgvs_data_provider() -> RESTDataProvider
def get_seqrepo() -> SeqRepo
```

### Authentication & Authorization
- **ORCID JWT tokens** and **API keys** for authentication
- **Role-based permissions** with `Action` enum and `assert_permission()` helper
- **User data context** available via `UserData` dataclass

### Router Patterns
- Endpoints organized by resource type in `src/mavedb/routers/`
- **Dependency injection** for auth, DB sessions, and external services
- **Structured exception handling** with custom exception types
- **Background job enqueueing** for publish/update operations

## Development Commands

### Environment Setup
```bash
# Local development with Docker
docker-compose -f docker-compose-dev.yml up --build -d

# Direct Python execution (requires env vars)
export PYTHONPATH="${PYTHONPATH}:`pwd`/src"
uvicorn mavedb.server_main:app --reload
```

### Testing
```bash
# Core dependencies only
poetry install --no-dev
poetry run pytest tests/

# Full test suite with optional dependencies
poetry install --with dev --extras server
poetry run pytest tests/ --cov=src
```

### Database Management
```bash
# Run migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "Description"

# Manual migration (for complex data changes)
# Place in alembic/manual_migrations/ and reference in version file
```

## Project Conventions

### Naming Conventions
- **Variables & functions**: `snake_case` (e.g., `score_set_id`, `create_variants_for_score_set`)
- **Classes**: `PascalCase` (e.g., `ScoreSet`, `UserData`, `ProcessingState`)
- **Constants**: `UPPER_SNAKE_CASE` (e.g., `MAPPING_QUEUE_NAME`, `DEFAULT_LDH_SUBMISSION_BATCH_SIZE`)
- **Enum values**: `snake_case` (e.g., `ProcessingState.success`, `MappingState.incomplete`)
- **Database tables**: `snake_case` with descriptive association table names (e.g., `scoreset_contributors`, `experiment_set_doi_identifiers`)
- **API endpoints**: kebab-case paths (e.g., `/score-sets`, `/experiment-sets`)

### Documentation Conventions
*For general Python documentation standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific additions:*

- **Algorithm explanations**: Include comments explaining complex logic, especially URN generation and bioinformatics operations
- **Design decisions**: Comment on why certain architectural choices were made
- **External dependencies**: Explain purpose of external bioinformatics libraries (HGVS, SeqRepo, etc.)
- **Bioinformatics context**: Document biological reasoning behind genomic data processing patterns

### Commenting Guidelines
**Core Principle: Write self-explanatory code. Comment only to explain WHY, not WHAT.**

**✅ WRITE Comments For:**
- **Complex bioinformatics algorithms**: Variant mapping algorithms, external service interactions
- **Business logic**: Why specific validation rules exist, regulatory requirements
- **External API constraints**: Rate limits, data format requirements
- **Non-obvious calculations**: Score normalization, statistical methods
- **Configuration values**: Why specific timeouts, batch sizes, or thresholds were chosen

**❌ AVOID Comments For:**
- **Obvious operations**: Variable assignments, simple loops, basic conditionals
- **Redundant descriptions**: Comments that repeat what the code clearly shows
- **Outdated information**: Comments that don't match current implementation

### Error Handling Conventions
- **Structured logging**: Always use `logger` with `extra=logging_context()` for correlation IDs
- **HTTP exceptions**: Use FastAPI `HTTPException` with appropriate status codes and descriptive messages
- **Custom exceptions**: Define domain-specific exceptions in `src/mavedb/lib/exceptions.py`
- **Worker job errors**: Send Slack notifications via `send_slack_error()` and log with full context
- **Validation errors**: Use Pydantic validators and raise `ValueError` with clear messages

### Code Style and Organization Conventions
*For general Python style conventions, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Async patterns**: Use `async def` for I/O operations, regular functions for CPU-bound work
- **Database operations**: Use SQLAlchemy 2.0 style with `session.scalars(select(...)).one()`
- **Pydantic models**: Separate request/response models with clear inheritance hierarchies
- **Bioinformatics data flow**: Structure code to clearly show genomic data transformations

### Testing Conventions
*For general Python testing standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Test function naming**: Use descriptive names that reflect bioinformatics operations (e.g., `test_cannot_publish_score_set_without_variants`)
- **Fixtures**: Use `conftest.py` for shared fixtures, especially database and worker setup
- **Mocking**: Use `unittest.mock.patch` for external bioinformatics services and worker jobs
- **Constants**: Define test data including genomic sequences and variants in `tests/helpers/constants.py`
- **Integration testing**: Test full bioinformatics workflows including external service interactions

## Codebase Conventions

### URN Validation
- Use regex patterns from `src/mavedb/lib/validation/urn_re.py`
- Validate URNs in Pydantic models with `@field_validator`
- URN generation logic in `src/mavedb/lib/urns.py` and `temp_urns.py`

### Worker Jobs (ARQ/Redis)
- **Job definitions**: All background jobs in `src/mavedb/worker/jobs.py`
- **Settings**: Worker configuration in `src/mavedb/worker/settings.py` with function registry and cron jobs
- **Job patterns**:
- Use `setup_job_state()` for logging context with correlation IDs
- Implement exponential backoff with `enqueue_job_with_backoff()`
- Handle database sessions within job context
- Send Slack notifications on failures via `send_slack_error()`
- **Key job types**:
- `create_variants_for_score_set` - Process uploaded CSV data
- `map_variants_for_score_set` - External variant mapping via VRS
- `submit_score_set_mappings_to_*` - Submit to external annotation services
- **Enqueueing**: Use `ArqRedis.enqueue_job()` from routers with correlation ID for request tracing

### View Models (Pydantic)
- **Base model** (`src/mavedb/view_models/base/base.py`) converts empty strings to None and uses camelCase aliases
- **Inheritance patterns**: `Base` → `Create` → `Modify` → `Saved` model hierarchy
- **Field validation**: Use `@field_validator` for single fields, `@model_validator(mode="after")` for cross-field validation
- **URN validation**: Validate URNs with regex patterns from `urn_re.py` in field validators
- **Transform functions**: Use functions in `validation/transform.py` for complex data transformations
- **Separate models**: Request (`Create`, `Modify`) vs response (`Saved`) models with different field requirements

### External Integrations
- **HGVS/SeqRepo** for genomic sequence operations
- **DCD Mapping** for variant mapping and VRS transformation
- **CDOT** for transcript/genomic coordinate conversion
- **GA4GH VRS** for variant representation standardization
- **ClinGen services** for allele registry and linked data hub submissions

## Key Files to Reference
- `src/mavedb/models/score_set.py` - Primary data model patterns
- `src/mavedb/routers/score_sets.py` - Complex router with worker integration
- `src/mavedb/worker/jobs.py` - Background processing patterns
- `src/mavedb/view_models/score_set.py` - Pydantic model hierarchy examples
- `src/mavedb/server_main.py` - Application setup and dependency injection
- `src/mavedb/data_providers/services.py` - External service integration patterns
- `src/mavedb/lib/authentication.py` - Authentication and authorization patterns
- `tests/conftest.py` - Test fixtures and database setup
- `docker-compose-dev.yml` - Service architecture and dependencies
Loading