Fix sample discovery when tumor-only run without pairs file; minor issue with pairs.tsv in test dataset
#174
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
XAVIER could fail at Snakefile parse time with:
Fatal: Either a valid pairs file or sample names must be provided.Sample names provided: set()even when valid *.fastq.gz inputs were provided and tumor-only mode should have proceeded.
Root cause
Sample discovery depends on name_symlinks, which is normally created by sym_safe() symlinking discovered inputs into:
input_files/fastq/ (FASTQ mode) or
input_files/bam/ (BAM mode)
However, the Snakefile logic previously did this:
If input_files/fastq existed, it only globbed input_files/fastq/*.fastq.gz and did not run sym_safe() again.
If the directory existed but was empty (e.g., from a partial init/failed run/manual mkdir), then name_symlinks=[] → samples=set() → read_pairsfile() raised the fatal error before any rules executed.
Issues
Harden sample discovery to repopulate symlinks when the input directory exists but contains no files:
Always os.makedirs(input_fqdir, exist_ok=True) / os.makedirs(input_bamdir, exist_ok=True)
Prefer existing symlinks when present
If globbing the directory returns empty, call sym_safe(...) to (re)populate it
Add an early, actionable error message if samples still cannot be inferred
This makes tumor-only runs robust to stale/empty input_files/* directories and prevents parse-time failure.
Fixes #172
Fixes #173
PR Checklist
(
Strikethroughany points that are not applicable.)[ ] Update docs if there are any API changes.CHANGELOG.mdwith a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/