Fix VCF integration assertion, revert extract scatters [VS-1820] by mcovarr · Pull Request #9340 · broadinstitute/gatk

mcovarr · 2026-03-02T15:08:18Z

awk does not use -gt for numeric comparisons; this should be (and used to be) >. Unfortunately our version of awk doesn't seem to complain when we try to use -gt and instead just silently exits with rc 0. 😞

$ DIFF_FOUND=0.2
$ TOLERANCE=0.1
$ awk "BEGIN{ exit ($DIFF_FOUND -gt $TOLERANCE) }"
$ echo $?
0
$ awk "BEGIN{ exit ($DIFF_FOUND > $TOLERANCE) }"
$ echo $?
1
$ awk "BEGIN{ exit ($DIFF_FOUND -lt $TOLERANCE) }"
$ echo $?
0
$ awk "BEGIN{ exit ($DIFF_FOUND < $TOLERANCE) }"
$ echo $?
0

Appropriately failing integration test here with just the comparison fixed.

After further research, I found that these large discrepancies between expected and actual values only seem to happen with 3 sample integration runs. For AnVIL 3K, using the 500-wide scatter actually significantly reduced Storage API bytes scanned during extract. Shards appear to have downloaded only the data they needed (more or less) and were far less subject to preemptions due to their much shorter runtimes.

The final changes here were to update the cost_observability.tsv file for Exome, BGE, VETS and VQSR for both 20/X/Y and all chromosomes to match the currently observed Storage API usage with a 500 wide extract scatter.

Copilot

Pull request overview

Fixes a broken numeric comparison in the VCF integration WDL assertion by using awk’s correct float comparison operator, ensuring the cost-diff tolerance check actually fails when it should.

Changes:

Replace invalid -gt usage in an awk numeric comparison with > so the assertion behaves correctly.
Update .dockstore.yml branch filters to include the PR branch for the integration workflow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
scripts/variantstore/wdl/test/GvsQuickstartVcfIntegration.wdl	Corrects the awk comparison operator so tolerance assertions don’t silently pass.
.dockstore.yml	Updates Dockstore branch filter to include the VS-1820 fix branch for `GvsQuickstartIntegration`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

koncheto-broad

To be clear, we're changing the behavior of extract to all users with less than 5000 samples in this case in order to fix this assertion? Because we only changed the scattering to 1 in order to satisfy the demand from some users for "one shard per chromosome" when possible. But that made extract longer and less reliable, so we changed it back to sharding to 500 for callsets below 5k. I'm not 100% sure that we should revert extract behavior to one shard per chromosome for the sake of making our integrations tests succeed without at least pricing out the actual cost consequences of leaving it the way it is and just changing the assertion. The api we use to pull the data is cheap and a relatively small fraction of extract cost, so I'd love to know how much it is actually costing us to leave things the (inefficient) way they are for now and potentially disable the test until we solve the issue. Could you calculate the actual cost consequences in two runs with the different sharding behaviors?

gatk-bot · 2026-03-10T20:54:29Z

Github actions tests reported job failures from actions build 22922237195
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	22922237195.3	logs

mcovarr added 2 commits March 2, 2026 09:50

Fix VCF integration assertion [VS-1820]

412aecf

dockstore

7b3b613

mcovarr requested a review from Copilot March 2, 2026 15:08

Copilot started reviewing on behalf of mcovarr March 2, 2026 15:09 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

sus

2bb4c5c

mcovarr marked this pull request as draft March 2, 2026 21:28

mcovarr force-pushed the vs_1820_fix_vcf_integration_assertion branch from f669666 to 2bb4c5c Compare March 3, 2026 14:05

dockstore

b41a2fa

mcovarr marked this pull request as ready for review March 3, 2026 19:55

revert dockstore

41e6e9f

mcovarr changed the title ~~Fix VCF integration assertion [VS-1820]~~ Fix VCF integration assertion, revert extract scatters [VS-1820] Mar 3, 2026

koncheto-broad reviewed Mar 4, 2026

View reviewed changes

gbggrant approved these changes Mar 4, 2026

View reviewed changes

mcovarr added 4 commits March 5, 2026 17:46

dockstore

51de693

more

088dde0

boop

ca46ec9

cleanup

24fe04a

mcovarr merged commit caaa54f into ah_var_store Mar 10, 2026
14 of 17 checks passed

mcovarr deleted the vs_1820_fix_vcf_integration_assertion branch March 10, 2026 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VCF integration assertion, revert extract scatters [VS-1820]#9340

Fix VCF integration assertion, revert extract scatters [VS-1820]#9340
mcovarr merged 9 commits intoah_var_storefrom
vs_1820_fix_vcf_integration_assertion

mcovarr commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

koncheto-broad left a comment

Uh oh!

Uh oh!

gatk-bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mcovarr commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

koncheto-broad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gatk-bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mcovarr commented Mar 2, 2026 •

edited

Loading