Speedup and refactor remove variants cmd#6094
Conversation
|
|
9095321 to
0e92415
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6094 +/- ##
==========================================
- Coverage 83.93% 83.90% -0.04%
==========================================
Files 343 343
Lines 21174 21207 +33
==========================================
+ Hits 17773 17793 +20
- Misses 3401 3414 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0408d18 to
5705c49
Compare
| remove_n_variants = 0 | ||
| remove_n_omics_variants = 0 | ||
|
|
||
| if not dry_run: |
There was a problem hiding this comment.
before if it was dry-run it was counting exactly how many vars would be deleted. I figured WHO CARES: what we are interested in is a list of cases with more than a number of variants and their specifics, to be be processed afterwards
There was a problem hiding this comment.
Yes, though it isn't perhaps a full dry-run: maybe we should have another name for it.
There was a problem hiding this comment.
It would be perhaps more appropriate, but I don't think it's worth because it would be a major change
There was a problem hiding this comment.
Ah, yes. We could add it to that list perhaps! Or split into two, a true dry-run, and a case variants count option. But lets not overdo it - its a use-once-a-year kind of thing.
|
Nice! I never thought I'd see the words "speedup" and "add progressbar" in the same PR! 😁 |
dnil
left a comment
There was a problem hiding this comment.
Super! Good continued subdivision of the problems, nice with the explicit logging to file and very nice to start treating the research variants separately. I think the latter still needs a bit of work, if I read this correctly. While those "snv_research" categories are very nice on the CLI (same as for upload file categories, etc) they don't really exist on the db for searching.
I still feel it would be helpful to just save the cases from an open cursor to a list, while we are working. The individual case queries are so slow, they easily time out the operation, even if one makes manual batches.
| remove_n_variants = 0 | ||
| remove_n_omics_variants = 0 | ||
|
|
||
| if not dry_run: |
There was a problem hiding this comment.
Yes, though it isn't perhaps a full dry-run: maybe we should have another name for it.
| return | ||
| delete_stats["case_counter"] += 1 | ||
| case_evaluated, _ = store.evaluated_variants( | ||
| case_id=case["_id"], institute_id=case["owner"], limit_dismissed=5 |
There was a problem hiding this comment.
Just to speed it up
There was a problem hiding this comment.
Absolutely, I do think it is correct: just wondering if you saw it somehow safer than say 2, or less so than 15?
There was a problem hiding this comment.
No, I haven't tested. We can have 15 if you prefer, doesn't matter to me
There was a problem hiding this comment.
I'm also fine with 0 as long as we understand what is happening?
| ], | ||
| } | ||
| if remove_ctg: | ||
| query["category"] = {"$in": remove_ctg} |
There was a problem hiding this comment.
Hm, we never split the CTGs right? So these would still have those whatever_research on them? But it would never propagate to a { category: 'whatever', variant_type: "research" }
There was a problem hiding this comment.
Ah right, now that you tell me I remember. I'll fix, or perhaps I'll just leave it as it was
There was a problem hiding this comment.
Reverted now! Very good point!
Co-authored-by: Daniel Nilsson <daniel.k.nilsson@gmail.com>
Co-authored-by: Daniel Nilsson <daniel.k.nilsson@gmail.com>
|
When you have time I'm ready for another round on this one @dnil. Thanks! |
|




This PR adds a functionality or fixes a bug.
scout delete variantscmd -> closes Speed up variant cleanup #6092--out-filewhere it prints a detailed report of the deletion process - real or simulated (no number of removed vars in this case)Testing on cg-vm1 server (Clinical Genomics Stockholm)
Prepare for testing
scout-stageand the server iscg-vm1.ssh <USER.NAME>@cg-vm1.scilifelab.sesudo -iu hiseq.clinicalssh localhostpodman pssystemctl --user stop scout@<name_of_currently_deployed_branch>systemctl --user start scout@<this_branch>systemctl --user status scout.targetscout-stage) to be used for testing by other users.Testing on hasta server (Clinical Genomics Stockholm)
Prepare for testing
ssh <USER.NAME>@hasta.scilifelab.seus; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.conda activate S_scout; pip freeze | grep scout-browserbash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>us; scout --versionpaxaprocedure, which will release the allocated resource (scout-stage) to be used for testing by other users.How to test:
Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.
Review: