update documentation

cying111 · cying111 · commit e2e300a5f6a5 · 2025-01-22T15:20:16.000+08:00
diff --git a/R/bambu.R b/R/bambu.R
@@ -142,7 +142,7 @@ bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
     fusionMode = FALSE, verbose = FALSE, demultiplexed = FALSE, spatial = NULL, quantData = NULL,
     sampleNames = NULL, cleanReads = FALSE, dedupUMI = FALSE, barcodesToFilter = NULL, clusters = NULL,
     processByChromosome = FALSE, processByBam = TRUE) {
-    message(paste0("Running Bambu-v", "3.3.0"))
+    message(paste0("Running Bambu-v", "3.9.0"))
     if(!is.null(mode)){
         if(mode == "bulk"){
             processByChromosome <- FALSE
@@ -249,15 +249,9 @@ bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
         }
     }
     
-   
-
     if (quant) {
         message("--- Start isoform EM quantification ---")
-        # the step below is a bit confusing but it seems to be the only way 
-        # if discovery == TRUE, extendAnnotations happen already
-        # if users want discovery at this step, assign a desired value for NDR with discovery being FALSE
-        # here also reads need to be not file or bam file or rc file
-        if(!is.null(NDR) & !discovery)
+        if(!is.null(NDR) & !discovery)# this step is used when reset NDR is needed 
             annotations <- setNDR(annotations, NDR, 
                                   prefix = isoreParameters$prefix, 
                 baselineFDR = isoreParameters[["baselineFDR"]], 
diff --git a/README.md b/README.md
@@ -365,12 +365,14 @@ rowData(se[[1]])
 |chr.rc|The chromosome name the read class is found on|
 |strand.rc|The strand of the read class|
 |startSD|The standard deviation of the aligned genomic start positions of all reads assigned to the read class|
+|endSD|The standard deviation of the aligned genomic end positions of all reads assigned to the read class|
 |readCount.posStrand|The number of reads assigned to this read class that aligned to the positive strand|
 |intronStarts|A comma separated character vector of intron start coordinates|
 |intronEnds|A comma separated character vector of intron end coordinates|
 |confidenceType|Category of confidence: <br/> **highConfidenceJunctionReads** - the read class contain no low confidence junctions <br/> **lowConfidenceJunctionReads** - the read class contains low confidence junctions <br/> **unsplicedWithin** - single exon read class that is within the exon boundaries of an annotation <br/> **unsplicedNew** - single exon read class that does not fully overlap with annotated exons|
 |readCount|The number of reads assigned to this read class|
-|readId *only present when trackReads = TRUE|An integer list of bambu internal read ids that belong to the read class. (See the metadata of the object for full read names)|
+|readIds|An integer list of bambu internal read ids that belong to the read class. (See the metadata of the object for full read names)|
+|sampleIds|An integer list of bambu internal sample ids based on barcodes.|
 |GENEID|The gene ID the transcript is associated with|
 |novelGene|A logical that is true if the read class belongs to a novel gene (does not overlap with an annotated gene loci)|
 |numExons|The number of exons the read class has|
@@ -382,8 +384,8 @@ rowData(se[[1]])
 |numAend|An integer counting the number of A nucleotides found within a 20bp window centered on the read class genomic end position|
 |numTstart|An integer counting the number of T nucleotides found within a 20bp window centered on the read class genomic start position|
 |numTend|An integer counting the number of T nucleotides found within a 20bp window centered on the read class genomic end position|
-|txScore|This is the TPS generated by the sample trained model|
 |txScore.noFit|This is the TPS generated by the pretrained model|
+|txScore|This is the TPS generated by the sample trained model|
 
 
 ### Tracking read-to-transcript assignment
@@ -476,30 +478,30 @@ If you want to run Bambu-Clump for single-cell or spatial analysis stand alone a
 
 #### Read Class Construction:
 
-**reads**: provided bam files must have barcodes in the read name or in the BC tag. Alternatively a csv file can be provided to demultiplexed mapping the read names to barcodes. For exact requirements see https://github.com/GoekeLab/bambu-singlecell-spatial.<br/>
+**reads**: provided bam files should have barcodes in the read name or in the BC tag ( and UG tag for UMI identifiers). In the case where both tags and read names contain barcode information, tags will be used a prior. If not, a regular delimited headerless file that contain the demultiplexing information for each read should be provided to demultiplexed argument below. For exact requirements see https://github.com/GoekeLab/bambu-singlecell-spatial.<br/>
 
-**demultiplexed**: must be set to TRUE (or be a barcode map). This will cause bambu to look for barcodes and seperate reads by barcode rather than sample. <br/>
+**demultiplexed**: should be either set to TRUE or the path to barcode mapping file. Otherwise, bambu will not look for barcodes and seperate reads by barcode rather than sample. <br/>
 
 Optional:
 
 **cleanReads**: A logical TRUE/FALSE. Chimeric reads in samples can cause issues with barcode assignments. Setting this to TRUE will ensure only the first alignment per barcode is used (We recommend using this). <br/>
 
 **sampleNames**: A vector of characters assigning names to each sample in the reads argument. By default the sample names are taken from the file names and appended to the barcodes in order to differentiate them. If your sample names are the same across multiple files, but matching barcodes between the samples should be counted seperately, provide them with different sample names using this argument. Similiarly if your samples have different names, but overlapping barcodes should be counted together, give them the same sample name with this argument.  <br/>
 
-**dedupUMI**: A logical TRUE/FALSE.   <br/>
+**dedupUMI**: A logical TRUE/FALSE.  <br/>
 
 **barcodesToFilter**: A string vector indicating barcodes to be filtered out.  <br/> 
 
 ```rscript
-readClassFile = bambu(reads = samples, annotations = annotations, genome = "$genome", ncore = $params.ncore, discovery = FALSE, quant = FALSE, demultiplexed = barcode_maps, verbose = TRUE, assignDist = FALSE, lowMemory = as.logical("$params.lowMemory"), yieldSize = 10000000, sampleNames = ids, cleanReads = as.logical($cleanReads), dedupUMI = as.logical($deduplicateUMIs))
+readClassFile = bambu(reads = samples, annotations = annotations, genome = fa.file, ncore = 1, discovery = FALSE, quant = FALSE, demultiplexed = barcode_maps, verbose = TRUE, assignDist = FALSE, lowMemory = as.logical("$params.lowMemory"), yieldSize = 10000000, sampleNames = ids, cleanReads = as.logical($cleanReads), dedupUMI = as.logical($deduplicateUMIs))
 ```
 
 #### Transcript Discovery:
 
 Transript discovery can be run as usual as typically bulk-level discovery is suitable. However cluster-level transcript discovery can be preformed using the clusters argument which can be redone done after clustering. 
 
 ```rscript
-extendedAnno = bambu(reads = readClassFile, annotations = annotations, genome = "$genome", ncore = $params.ncore, discovery = TRUE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, assignDist = FALSE)
+extendedAnno = bambu(reads = readClassFile, annotations = annotations, genome = fa.file, ncore = 1, discovery = TRUE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, assignDist = FALSE)
 ```
 
 #### Read Class Assignment:
@@ -509,7 +511,7 @@ This step was previously performed together with the quantification, but can be
 **spatial**: This should be a path to your barcode whitelist that also contians the x and y coordinates as extra columns. 
 
 ```rscript
-quantData = bambu(reads = readClassFile, annotations = extendedAnno, genome = "$genome", ncore = $params.ncore, discovery = FALSE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, opt.em = list(degradationBias = FALSE), assignDist = TRUE, spatial = spatial)
+quantData = bambu(reads = readClassFile, annotations = extendedAnno, genome = fa.file, ncore = 1, discovery = FALSE, quant = FALSE, demultiplexed = TRUE, verbose = FALSE, opt.em = list(degradationBias = FALSE), assignDist = TRUE, spatial = spatial)
 ```
 
 #### EM quantification:
@@ -641,14 +643,15 @@ rowData(se)
 |---|---|
 |TXNAME|The transcript name for the transcript. Will use either the transcript name from the provided annotations or tx.X if it is a novel transcript where X is a unique integer.|
 |GENEID|The gene name for the transcript. Will use either the gene name from the provided annotations or gene.X if it is a novel transcript where X is a unique integer.| 
-|eqClass|A character vector with the transcript names of all the equivalent transcripts (those which have this transcripts contiguous exon junctions)|
-|txId|A bambu specific transcript id used for indexing purposes
-|eqClassById|A integer list with the transcript ids of all equivalent transcripts
+|NDR|The NDR score calculated for the transcript|
+|novelGene|A logical variable that is true if transcript model is from a novel gene (does not overlap with an annotated gene loci)|
+|novelTranscript|A logical variable that is true if transcript model is novel  (passing NDR threshold)|
 |txClassDescription|A concatenated string containing the classes the transcript falls under: <br/> **annotation** - Transcript matches an annotation transcript <br/> **allNew** - All the intron-junctions are novel <br/> **newFirstJunction** - the first junction is novel and at least one other junction matches an annotated transcript <br/> **newLastJunction** - the last junction is novel and at least one other junction matches an annotated transcript <br/> **newJunction** - an internal junction is novel and at least one other internal junction matches an annotated transcript <br/> **newWithin** -  A novel transcript with matching junctions but is not a subset of an annotation <br/> **unsplicedNew** - A single exon transcript that doesn’t completely overlap with annotations <br/> **compatible** - Is a subset of an annotated transcript <br/> **newFirstExon** - The first exon is novel <br/> **newLastExon** - The last exon is novel|
 |readCount|The number of full length reads associated with this transcript (filtered by min.readCount)|
-|NDR|The NDR score calculated for the transcript|
 |relReadCount|The proportion of reads this transcript has relative to all reads assigned to its gene|
 |relSubsetCount|The proportion of reads this transcript has relative to all reads that either fully or partially match this transcript|
+|txId|A bambu specific transcript id used for indexing purposes
+|eqClassById|A integer list with the transcript ids of all equivalent transcripts
 |maxTxScore|The maximum model score across samples from the sample-trained model. Used internally by Bambu to calculate NDR scores|
 |maxTxScore.noFit|The maximum model score across samples from the pretrained model. Used internally by Bambu to recommend NDR thresholds|
 
@@ -676,9 +679,9 @@ metadata(rowRanges(se))$warnings
 
 ### Release History
 
-**bambu v3.3.0**
+**bambu v3.9.0**
 
-Release date: 2024-October-28
+Release date: 2025-xxx-xx
 
 - Subset transcripts and those above the NDR threshold are placed into the metadata of the annotations in $subsetTranscripts and $lowConfidenceTranscripts respectively (when filtered out by default).
 - adds the setNDR function