Skip to content

Commit e2e300a

Browse files
committed
update documentation
1 parent b2be433 commit e2e300a

File tree

2 files changed

+19
-22
lines changed

2 files changed

+19
-22
lines changed

R/bambu.R

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
142142
fusionMode = FALSE, verbose = FALSE, demultiplexed = FALSE, spatial = NULL, quantData = NULL,
143143
sampleNames = NULL, cleanReads = FALSE, dedupUMI = FALSE, barcodesToFilter = NULL, clusters = NULL,
144144
processByChromosome = FALSE, processByBam = TRUE) {
145-
message(paste0("Running Bambu-v", "3.3.0"))
145+
message(paste0("Running Bambu-v", "3.9.0"))
146146
if(!is.null(mode)){
147147
if(mode == "bulk"){
148148
processByChromosome <- FALSE
@@ -249,15 +249,9 @@ bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
249249
}
250250
}
251251

252-
253-
254252
if (quant) {
255253
message("--- Start isoform EM quantification ---")
256-
# the step below is a bit confusing but it seems to be the only way
257-
# if discovery == TRUE, extendAnnotations happen already
258-
# if users want discovery at this step, assign a desired value for NDR with discovery being FALSE
259-
# here also reads need to be not file or bam file or rc file
260-
if(!is.null(NDR) & !discovery)
254+
if(!is.null(NDR) & !discovery)# this step is used when reset NDR is needed
261255
annotations <- setNDR(annotations, NDR,
262256
prefix = isoreParameters$prefix,
263257
baselineFDR = isoreParameters[["baselineFDR"]],

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -365,12 +365,14 @@ rowData(se[[1]])
365365
|chr.rc|The chromosome name the read class is found on|
366366
|strand.rc|The strand of the read class|
367367
|startSD|The standard deviation of the aligned genomic start positions of all reads assigned to the read class|
368+
|endSD|The standard deviation of the aligned genomic end positions of all reads assigned to the read class|
368369
|readCount.posStrand|The number of reads assigned to this read class that aligned to the positive strand|
369370
|intronStarts|A comma separated character vector of intron start coordinates|
370371
|intronEnds|A comma separated character vector of intron end coordinates|
371372
|confidenceType|Category of confidence: <br/> **highConfidenceJunctionReads** - the read class contain no low confidence junctions <br/> **lowConfidenceJunctionReads** - the read class contains low confidence junctions <br/> **unsplicedWithin** - single exon read class that is within the exon boundaries of an annotation <br/> **unsplicedNew** - single exon read class that does not fully overlap with annotated exons|
372373
|readCount|The number of reads assigned to this read class|
373-
|readId *only present when trackReads = TRUE|An integer list of bambu internal read ids that belong to the read class. (See the metadata of the object for full read names)|
374+
|readIds|An integer list of bambu internal read ids that belong to the read class. (See the metadata of the object for full read names)|
375+
|sampleIds|An integer list of bambu internal sample ids based on barcodes.|
374376
|GENEID|The gene ID the transcript is associated with|
375377
|novelGene|A logical that is true if the read class belongs to a novel gene (does not overlap with an annotated gene loci)|
376378
|numExons|The number of exons the read class has|
@@ -382,8 +384,8 @@ rowData(se[[1]])
382384
|numAend|An integer counting the number of A nucleotides found within a 20bp window centered on the read class genomic end position|
383385
|numTstart|An integer counting the number of T nucleotides found within a 20bp window centered on the read class genomic start position|
384386
|numTend|An integer counting the number of T nucleotides found within a 20bp window centered on the read class genomic end position|
385-
|txScore|This is the TPS generated by the sample trained model|
386387
|txScore.noFit|This is the TPS generated by the pretrained model|
388+
|txScore|This is the TPS generated by the sample trained model|
387389

388390

389391
### Tracking read-to-transcript assignment
@@ -476,30 +478,30 @@ If you want to run Bambu-Clump for single-cell or spatial analysis stand alone a
476478

477479
#### Read Class Construction:
478480

479-
**reads**: provided bam files must have barcodes in the read name or in the BC tag. Alternatively a csv file can be provided to demultiplexed mapping the read names to barcodes. For exact requirements see https://github.com/GoekeLab/bambu-singlecell-spatial.<br/>
481+
**reads**: provided bam files should have barcodes in the read name or in the BC tag ( and UG tag for UMI identifiers). In the case where both tags and read names contain barcode information, tags will be used a prior. If not, a regular delimited headerless file that contain the demultiplexing information for each read should be provided to demultiplexed argument below. For exact requirements see https://github.com/GoekeLab/bambu-singlecell-spatial.<br/>
480482

481-
**demultiplexed**: must be set to TRUE (or be a barcode map). This will cause bambu to look for barcodes and seperate reads by barcode rather than sample. <br/>
483+
**demultiplexed**: should be either set to TRUE or the path to barcode mapping file. Otherwise, bambu will not look for barcodes and seperate reads by barcode rather than sample. <br/>
482484

483485
Optional:
484486

485487
**cleanReads**: A logical TRUE/FALSE. Chimeric reads in samples can cause issues with barcode assignments. Setting this to TRUE will ensure only the first alignment per barcode is used (We recommend using this). <br/>
486488

487489
**sampleNames**: A vector of characters assigning names to each sample in the reads argument. By default the sample names are taken from the file names and appended to the barcodes in order to differentiate them. If your sample names are the same across multiple files, but matching barcodes between the samples should be counted seperately, provide them with different sample names using this argument. Similiarly if your samples have different names, but overlapping barcodes should be counted together, give them the same sample name with this argument. <br/>
488490

489-
**dedupUMI**: A logical TRUE/FALSE. <br/>
491+
**dedupUMI**: A logical TRUE/FALSE. <br/>
490492

491493
**barcodesToFilter**: A string vector indicating barcodes to be filtered out. <br/>
492494

493495
```rscript
494-
readClassFile = bambu(reads = samples, annotations = annotations, genome = "$genome", ncore = $params.ncore, discovery = FALSE, quant = FALSE, demultiplexed = barcode_maps, verbose = TRUE, assignDist = FALSE, lowMemory = as.logical("$params.lowMemory"), yieldSize = 10000000, sampleNames = ids, cleanReads = as.logical($cleanReads), dedupUMI = as.logical($deduplicateUMIs))
496+
readClassFile = bambu(reads = samples, annotations = annotations, genome = fa.file, ncore = 1, discovery = FALSE, quant = FALSE, demultiplexed = barcode_maps, verbose = TRUE, assignDist = FALSE, lowMemory = as.logical("$params.lowMemory"), yieldSize = 10000000, sampleNames = ids, cleanReads = as.logical($cleanReads), dedupUMI = as.logical($deduplicateUMIs))
495497
```
496498

497499
#### Transcript Discovery:
498500

499501
Transript discovery can be run as usual as typically bulk-level discovery is suitable. However cluster-level transcript discovery can be preformed using the clusters argument which can be redone done after clustering.
500502

501503
```rscript
502-
extendedAnno = bambu(reads = readClassFile, annotations = annotations, genome = "$genome", ncore = $params.ncore, discovery = TRUE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, assignDist = FALSE)
504+
extendedAnno = bambu(reads = readClassFile, annotations = annotations, genome = fa.file, ncore = 1, discovery = TRUE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, assignDist = FALSE)
503505
```
504506

505507
#### Read Class Assignment:
@@ -509,7 +511,7 @@ This step was previously performed together with the quantification, but can be
509511
**spatial**: This should be a path to your barcode whitelist that also contians the x and y coordinates as extra columns.
510512

511513
```rscript
512-
quantData = bambu(reads = readClassFile, annotations = extendedAnno, genome = "$genome", ncore = $params.ncore, discovery = FALSE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, opt.em = list(degradationBias = FALSE), assignDist = TRUE, spatial = spatial)
514+
quantData = bambu(reads = readClassFile, annotations = extendedAnno, genome = fa.file, ncore = 1, discovery = FALSE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, opt.em = list(degradationBias = FALSE), assignDist = TRUE, spatial = spatial)
513515
```
514516

515517
#### EM quantification:
@@ -641,14 +643,15 @@ rowData(se)
641643
|---|---|
642644
|TXNAME|The transcript name for the transcript. Will use either the transcript name from the provided annotations or tx.X if it is a novel transcript where X is a unique integer.|
643645
|GENEID|The gene name for the transcript. Will use either the gene name from the provided annotations or gene.X if it is a novel transcript where X is a unique integer.|
644-
|eqClass|A character vector with the transcript names of all the equivalent transcripts (those which have this transcripts contiguous exon junctions)|
645-
|txId|A bambu specific transcript id used for indexing purposes
646-
|eqClassById|A integer list with the transcript ids of all equivalent transcripts
646+
|NDR|The NDR score calculated for the transcript|
647+
|novelGene|A logical variable that is true if transcript model is from a novel gene (does not overlap with an annotated gene loci)|
648+
|novelTranscript|A logical variable that is true if transcript model is novel (passing NDR threshold)|
647649
|txClassDescription|A concatenated string containing the classes the transcript falls under: <br/> **annotation** - Transcript matches an annotation transcript <br/> **allNew** - All the intron-junctions are novel <br/> **newFirstJunction** - the first junction is novel and at least one other junction matches an annotated transcript <br/> **newLastJunction** - the last junction is novel and at least one other junction matches an annotated transcript <br/> **newJunction** - an internal junction is novel and at least one other internal junction matches an annotated transcript <br/> **newWithin** - A novel transcript with matching junctions but is not a subset of an annotation <br/> **unsplicedNew** - A single exon transcript that doesn’t completely overlap with annotations <br/> **compatible** - Is a subset of an annotated transcript <br/> **newFirstExon** - The first exon is novel <br/> **newLastExon** - The last exon is novel|
648650
|readCount|The number of full length reads associated with this transcript (filtered by min.readCount)|
649-
|NDR|The NDR score calculated for the transcript|
650651
|relReadCount|The proportion of reads this transcript has relative to all reads assigned to its gene|
651652
|relSubsetCount|The proportion of reads this transcript has relative to all reads that either fully or partially match this transcript|
653+
|txId|A bambu specific transcript id used for indexing purposes
654+
|eqClassById|A integer list with the transcript ids of all equivalent transcripts
652655
|maxTxScore|The maximum model score across samples from the sample-trained model. Used internally by Bambu to calculate NDR scores|
653656
|maxTxScore.noFit|The maximum model score across samples from the pretrained model. Used internally by Bambu to recommend NDR thresholds|
654657

@@ -676,9 +679,9 @@ metadata(rowRanges(se))$warnings
676679

677680
### Release History
678681

679-
**bambu v3.3.0**
682+
**bambu v3.9.0**
680683

681-
Release date: 2024-October-28
684+
Release date: 2025-xxx-xx
682685

683686
- Subset transcripts and those above the NDR threshold are placed into the metadata of the annotations in $subsetTranscripts and $lowConfidenceTranscripts respectively (when filtered out by default).
684687
- adds the setNDR function

0 commit comments

Comments
 (0)