Add VEP + LOFTEE (+ GERP + HGNC) to VAT [VS-1520] [VS-1765] [VS-1767]#9299
Add VEP + LOFTEE (+ GERP + HGNC) to VAT [VS-1520] [VS-1765] [VS-1767]#9299mcovarr wants to merge 114 commits intoah_var_storefrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR integrates VEP (Variant Effect Predictor) and LOFTEE (Loss-Of-Function Transcript Effect Estimator) annotations into the Variant Annotations Table (VAT), adding support for GERP conservation scores and HGNC gene nomenclature data. The implementation creates a new Docker image for VEP+LOFTEE and establishes a pipeline to generate, load, and process these annotations through BigQuery.
Key changes:
- Adds VEP+LOFTEE annotation generation task with GERP and HGNC support
- Implements BigQuery pipeline for loading and transforming raw VEP+LOFTEE output
- Updates VAT schema with new fields: hgnc_symbol, hgnc_id, LoF annotations, and GERP scores
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/variantstore/wdl/GvsUtils.wdl | Adds vep_loftee_docker image reference and updates variants_docker version |
| scripts/variantstore/variant-annotations-table/GvsCreateVATfromVDS.wdl | Implements VEP+LOFTEE annotation workflow with three new tasks and integrates results into VAT |
| scripts/variantstore/scripts/variant_annotation_table/schema/vat_schema.json | Adds schema definitions for HGNC and LOFTEE annotation fields |
| scripts/variantstore/scripts/variant_annotation_table/schema/variant_transcript_schema.json | Adds schema definitions for HGNC and LOFTEE annotation fields |
| .dockstore.yml | Updates branch tracking configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scripts/variantstore/variant-annotations-table/GvsCreateVATfromVDS.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant-annotations-table/GvsCreateVATfromVDS.wdl
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Github actions tests reported job failures from actions build 21251106925
|
|
Github actions tests reported job failures from actions build 21251494218
|
| String? vep_loftee_data_table_raw | ||
| String? vep_loftee_data_table_cooked | ||
|
|
||
| String loftee_references_dir = "gs://gvs-internal/loftee/" |
There was a problem hiding this comment.
Can we access this from AoU?
There was a problem hiding this comment.
Given this comment I'd say probably not, but having the references in a bucket is just a stopgap measure until we create a reference disk for these.
| output { | ||
| File output_file = "vep_loftee_raw_output.txt" | ||
| File monitoring_log = "monitoring.log" | ||
| File? warnings = "warnings.txt" |
There was a problem hiding this comment.
Why is this an optional output? Maybe make it a 0 length file if not being set.
There was a problem hiding this comment.
VEP apparently doesn't create the file unless there are warnings to be logged.
| } | ||
| } | ||
|
|
||
| task BigQueryCookVepAndLofteeRawAnnotations { |
There was a problem hiding this comment.
What does cook mean - maybe clarify.
There was a problem hiding this comment.
Sure. It's meant to contrast with the "raw" data, but I can certainly add details.
|
Github actions tests reported job failures from actions build 21563034650
|
Spreadsheet with LoF prevalence and VAT vs VEP Ensembl ID distributions here.