-
Notifications
You must be signed in to change notification settings - Fork 1
Open
1 / 51 of 5 issues completedOpen
1 / 51 of 5 issues completed
Copy link
Labels
proposalProposals that warrant further discussionProposals that warrant further discussion
Description
Updated after dev-chat discussion 2025-05-29/30. Previously based on these meeting notes, 2024-11-21/22.
This is an overview issue around the tasks, topics and ideas related to external collaborators/users running workflows separate to the pathogen-repo itself.
- external directory support
- The CLI now supports
nextstrain {setup,run}etc. There's a big list of nice-to-have's this part is essentially done. - Measles supports this interface fully. Avian-flu works for
nextstrain run, but doesn't conform to the standard file structure sonextstrain setupshows errors.- Jover is planning on pushing this out to zika etc
- [implementation] we are using the shared repo to vendor common code
- merging private data
- Running list of pathogens supporting this in various capacities: Provide a generic pattern for including additional user data alongside curated data #72 (comment)
- I think we're all happy with the interface used in avian-flu being used more widely
- Implementation may be improved by using shared code for remote resources or snakemake v8s support
- Not going to contemplate curation of user data at this stage, but if we do this the proposal was to enable a config-hook to such that private data (in the analysis directory) is passed through a user-defined program (etc) before merge
- PGCoE aim for 25-26 is to link up subsampling and private data
- Potentially our default inputs should include an (optional) default location for private data - i.e. if these files exist in the analysis directory then theyll be used without needing to write a config overlay.
- Generalized subsampling
- PGCoE aim for 24-25 to have a general augur subsample command. No proximity needed at this stage.
- Specifics to come, but see augur subsample command augur#635 for prior art
- The hard part is the config interface!
- We added weighted sampling already and we need to work out what use-cases can be achieved by this alone (i.e. clarify where we actually need a subsampling command)
- consistent config syntax
- Big picture config syntax stuff is probably a long-term thing, and not blocking here
- First step: try out the globbing syntax and try it out on a non-wildcard repo
- We need a way to encode a null value, and it'd probably be good to standardise this
- Ultimately each repo will need to have its own docs…
- Workflow versioning, docs etc
- Immediately we should have a changelog and description of how to run (markdown's fine) in each repo we expect to run via external analysis directories
- Longer term (medium term?) docs.nextstrain.org will have repo-specific (pathogen-specific) docs sub-projects Linked into reference docs for shared functionality.
- Aim: Expectation is that each mature pathogen repo has such a docs project
- Easiest way to roll this out may be a skeleton in the repo guide, but avoid the situation where placeholder text will make it into pathogen repos themselves
- We’ve played with JSON schemas (for the config) and auto-generating HTML docs from this. That effort wasn’t successful enough to merge but I think this will be where we eventually end up.
- Also consider implementing one-off checks within code (e.g.) when making changes to the configs
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
proposalProposals that warrant further discussionProposals that warrant further discussion