-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
Hello,
I’m running bambu (v3.9.0) on 566 Nanopore cDNA BAMs aligned to GRCh38.
Quantification-only (discovery = FALSE) completes fine, but enabling discovery = TRUE keeps failing with a BiocParallel error, usually at different samples (e.g. sample 103 or 108). I've previously run bambu successfully for ~400 samples so I'm not sure what has changed on my end.
Environment:
- R version: R 4.5.1
- bambu version: 3.9.0 (also tested latest GitHub)
- Bioconductor: 3.21
- OS: Linux (HPC SLURM cluster)
- Threads: tested both ncore=16 and ncore=1
Example script
suppressPackageStartupMessages({
library(bambu)
library(Rsamtools)
library(GenomeInfoDb)
})
fa <- "references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa"
gtf <- "references/gencode.v49.annotation.gtf"
bams <- sort(Sys.glob("minimap2/*.sorted.bam"))
ann <- prepareAnnotations(gtf)
seqlevelsStyle(ann) <- "UCSC"
outdir <- "bambu"
rcdir <- file.path(outdir, "rcOutDir")
dir.create(rcdir, recursive = TRUE, showWarnings = FALSE)
res <- bambu(
reads = BamFileList(bams, asMates = FALSE, yieldSize = 1e7),
annotations = ann,
genome = fa,
rcOutDir = rcdir,
ncore = 1,
yieldSize = 1e7,
discovery = TRUE,
verbose = TRUE
)
Error output
Running Bambu-v3.9.0
--- Start generating read class files ---
sample_1.sorted.bam
Number of alignments/reads: 14469864
prediction accuracy (CV) (higher for splice donor than splice acceptor)
Error: BiocParallel errors
1 remote errors, element index: 1
565 unevaluated and other errors
first remote error:
Error in fisher.test(table(predictions > 0.5, labels.train.cv.test)):
'x' must have at least 2 rows and columns
Occasionally the error message is:
Error in bambu.processReadsByFile(...):
not all chromosomes present in reference annotations, annotations might be incomplete.
But I’ve verified that BAMs, FASTA, and GTF share the expected UCSC-style seqlevels and pruning with keepSeqlevels does not resolve the fisher.test error.
Notes:
- The same BAMs and reference/annotation work fine in quantification-only mode.
- The crash occurs at different BAMs on different runs, so it doesn’t seem to be a single corrupted file.
- Increasing min.readCount and min.sampleNumber in opt.discovery did not prevent the error.
- Running with ncore=1 produces the same error, so it doesn’t appear to be a threading issue.
Questions:
- Is this a known issue with the discovery step’s splice motif model (fisher.test)?
- Do you have recommendations on specific discovery parameters, or a workaround, to avoid this error? I can't see any issues reporting this, so I would appreciate your help!
R and bash scripts, and standard output using stable and updated releases:
Metadata
Metadata
Assignees
Labels
No labels