Skip to content

Add VEP + LOFTEE (+ GERP + HGNC) to VAT [VS-1520] [VS-1765] [VS-1767]#9299

Open
mcovarr wants to merge 114 commits intoah_var_storefrom
vs_1520_loftee
Open

Add VEP + LOFTEE (+ GERP + HGNC) to VAT [VS-1520] [VS-1765] [VS-1767]#9299
mcovarr wants to merge 114 commits intoah_var_storefrom
vs_1520_loftee

Conversation

@mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented Nov 24, 2025

  • Successful Quickstart run here.
  • Successful AnVIL 3k run here.

Spreadsheet with LoF prevalence and VAT vs VEP Ensembl ID distributions here.

@mcovarr mcovarr requested a review from Copilot November 25, 2025 17:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates VEP (Variant Effect Predictor) and LOFTEE (Loss-Of-Function Transcript Effect Estimator) annotations into the Variant Annotations Table (VAT), adding support for GERP conservation scores and HGNC gene nomenclature data. The implementation creates a new Docker image for VEP+LOFTEE and establishes a pipeline to generate, load, and process these annotations through BigQuery.

Key changes:

  • Adds VEP+LOFTEE annotation generation task with GERP and HGNC support
  • Implements BigQuery pipeline for loading and transforming raw VEP+LOFTEE output
  • Updates VAT schema with new fields: hgnc_symbol, hgnc_id, LoF annotations, and GERP scores

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/variantstore/wdl/GvsUtils.wdl Adds vep_loftee_docker image reference and updates variants_docker version
scripts/variantstore/variant-annotations-table/GvsCreateVATfromVDS.wdl Implements VEP+LOFTEE annotation workflow with three new tasks and integrates results into VAT
scripts/variantstore/scripts/variant_annotation_table/schema/vat_schema.json Adds schema definitions for HGNC and LOFTEE annotation fields
scripts/variantstore/scripts/variant_annotation_table/schema/variant_transcript_schema.json Adds schema definitions for HGNC and LOFTEE annotation fields
.dockstore.yml Updates branch tracking configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 21251106925
Failures in the following jobs:

Test Type JDK Job ID Logs
conda 17.0.6+10 21251106925.3 logs

@gatk-bot
Copy link

gatk-bot commented Jan 22, 2026

Github actions tests reported job failures from actions build 21251494218
Failures in the following jobs:

Test Type JDK Job ID Logs
conda 17.0.6+10 21251494218.3 logs
conda 17.0.6+10 21251494218.3 logs

Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

String? vep_loftee_data_table_raw
String? vep_loftee_data_table_cooked

String loftee_references_dir = "gs://gvs-internal/loftee/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we access this from AoU?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this comment I'd say probably not, but having the references in a bucket is just a stopgap measure until we create a reference disk for these.

output {
File output_file = "vep_loftee_raw_output.txt"
File monitoring_log = "monitoring.log"
File? warnings = "warnings.txt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an optional output? Maybe make it a 0 length file if not being set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VEP apparently doesn't create the file unless there are warnings to be logged.

}
}

task BigQueryCookVepAndLofteeRawAnnotations {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does cook mean - maybe clarify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's meant to contrast with the "raw" data, but I can certainly add details.

@gatk-bot
Copy link

gatk-bot commented Feb 1, 2026

Github actions tests reported job failures from actions build 21563034650
Failures in the following jobs:

Test Type JDK Job ID Logs
conda 17.0.6+10 21563034650.3 logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants