|
| 1 | +# Common Bioconductor Methods and Classes {#reusebioc} |
| 2 | + |
| 3 | +## Motivation {#bioc-common-motivation} |
| 4 | + |
| 5 | +Bioconductor is a large and diverse project with many packages that |
| 6 | +provide functionality for a wide range of biological data types and statistical |
| 7 | +methods. It is, therefore, important to reuse existing data classes and methods |
| 8 | +to ensure that packages are inter-operable with the rest of the _Bioconductor_ |
| 9 | +ecosystem. Central data representations allow users to readily integrate |
| 10 | +analysis workflows across multiple Bioconductor packages providing a more |
| 11 | +seamless user experience. |
| 12 | + |
| 13 | +Of course, there are data that have no established representation in |
| 14 | +Bioconductor, and in such cases, new classes can be developed ideally with open |
| 15 | +discussion and consideration of the Bioconductor community. |
| 16 | + |
| 17 | +### Use Case: Importing data {#commonimport} |
| 18 | + |
| 19 | +For developers who import data into their package, it is important to know which |
| 20 | +packages and methods are available for reuse. The following list provides |
| 21 | +commonly used packages and their methods to import various data types: |
| 22 | + |
| 23 | ++ GTF, GFF, BED, BigWig, etc., -- `r BiocStyle::Biocpkg("rtracklayer")` `::import()` |
| 24 | ++ VCF -- `r BiocStyle::Biocpkg("VariantAnnotation")` `::readVcf()` |
| 25 | ++ SAM / BAM -- `r BiocStyle::Biocpkg("Rsamtools")` `::scanBam()`, |
| 26 | + `r BiocStyle::Biocpkg("GenomicAlignments")` `::readGAlignment*()` |
| 27 | ++ FASTA -- `r BiocStyle::Biocpkg("Biostrings")` `::readDNAStringSet()` |
| 28 | ++ FASTQ -- `r BiocStyle::Biocpkg("ShortRead")` `::readFastq()` |
| 29 | ++ MS data (XML-based and mgf formats) -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`, |
| 30 | + `r BiocStyle::Biocpkg("Spectra")` `::Spectra(source = MsBackendMgf::MsBackendMgf())` |
| 31 | + |
| 32 | +This list is not exhaustive, and developers are encouraged to initiate dialogue |
| 33 | +with other community members to identify additional packages and methods that |
| 34 | +may be useful for their specific use case. We acknowledge that class and method |
| 35 | +discover-ability can be a challenge and we are working to improve this aspect of |
| 36 | +the Bioconductor project. |
| 37 | + |
| 38 | +### Common Classes {#commonclass} |
| 39 | + |
| 40 | +The following list, though not exhaustive, provides select classes and |
| 41 | +constructor functions to represent genomic data: |
| 42 | + |
| 43 | ++ Rectangular feature x sample data -- |
| 44 | + `r BiocStyle::Biocpkg("SummarizedExperiment")` `::SummarizedExperiment()` |
| 45 | + (RNAseq count matrix, microarray, ...) |
| 46 | ++ Genomic coordinates -- `r BiocStyle::Biocpkg("GenomicRanges")` `::GRanges()` |
| 47 | + (1-based, closed interval) |
| 48 | ++ Genomic coordinates from multiple samples -- |
| 49 | + `r BiocStyle::Biocpkg("GenomicRanges")` `::GRangesList()` |
| 50 | ++ Ragged genomic coordinates -- `r BiocStyle::Biocpkg("RaggedExperiment")` |
| 51 | + `::RaggedExperiment()` |
| 52 | ++ DNA / RNA / AA sequences -- `r BiocStyle::Biocpkg("Biostrings")` |
| 53 | + `::*StringSet()` |
| 54 | ++ Gene sets -- `r BiocStyle::Biocpkg("BiocSet")` `::BiocSet()`, |
| 55 | + `r BiocStyle::Biocpkg("GSEABase")` `::GeneSet()`, |
| 56 | + `r BiocStyle::Biocpkg("GSEABase")` `::GeneSetCollection()` |
| 57 | ++ Multi-omics data -- |
| 58 | + `r BiocStyle::Biocpkg("MultiAssayExperiment")` `::MultiAssayExperiment()` |
| 59 | ++ Single cell data -- |
| 60 | + `r BiocStyle::Biocpkg("SingleCellExperiment")` `::SingleCellExperiment()` |
| 61 | ++ Mass spec data -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()` |
| 62 | ++ File formats -- `r BiocStyle::Biocpkg("BiocIO")` `` ::`BiocFile-class` `` |
| 63 | + |
| 64 | +Search [biocViews][] for other classes and methods that may be useful for your |
| 65 | +package. |
| 66 | + |
| 67 | +## Package Submission Considerations |
| 68 | + |
| 69 | +Bioconductor strives for interoperability across packages. To ensure this, we |
| 70 | +strongly encourage that package submissions reuse existing Bioconductor classes |
| 71 | +and methods. Packages that do not follow this guideline may be asked to revise |
| 72 | +their code to use existing classes and methods. |
| 73 | + |
| 74 | +In the case where the data does not conform to an existing data class, |
| 75 | +we recommend discussing the design of a new class with the Bioconductor |
| 76 | +community. The open discussion can take place on main Bioconductor communication |
| 77 | +channels such as the [bioc-devel][bioc-devel-mail] mailing list, or the |
| 78 | +Bioconductor slack. |
| 79 | + |
| 80 | +## Package Implementations |
| 81 | + |
| 82 | +The following packages are examples of packages that have reused Bioconductor |
| 83 | +classes and methods: |
| 84 | + |
| 85 | ++ `r BiocStyle::Biocpkg("DESeq2")` -- |
| 86 | + Uses `r BiocStyle::Biocpkg("SummarizedExperiment")` and |
| 87 | + `r BiocStyle::Biocpkg("GenomicRanges")` |
| 88 | ++ `r BiocStyle::Biocpkg("GenomicAlignments")` -- |
| 89 | + Uses `r BiocStyle::Biocpkg("GenomicRanges")` and |
| 90 | + `r BiocStyle::Biocpkg("Rsamtools")` |
| 91 | ++ `r BiocStyle::Biocpkg("VariantAnnotation")` -- |
| 92 | + Uses `r BiocStyle::Biocpkg("GenomicRanges")`, |
| 93 | + `r BiocStyle::Biocpkg("SummarizedExperiment")`, and |
| 94 | + `r BiocStyle::Biocpkg("Rsamtools")` classes. |
| 95 | + |
| 96 | + |
| 97 | + |
0 commit comments