|
| 1 | +CuratedAtlasQueryR |
| 2 | +================ |
| 3 | + |
| 4 | + #> here() starts at /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/CuratedAtlasQueryR |
| 5 | + |
| 6 | +<img src="../inst/logo.png" width="120px" height="139px" /> |
| 7 | + |
| 8 | +## Load the package |
| 9 | + |
| 10 | +``` r |
| 11 | +library(CuratedAtlasQueryR) |
| 12 | +library(dplyr) |
| 13 | +library(stringr) |
| 14 | +``` |
| 15 | + |
| 16 | +## Load and explore the metadata |
| 17 | + |
| 18 | +### Load the metadata |
| 19 | + |
| 20 | +``` r |
| 21 | +get_metadata() |
| 22 | +#> # Source: table<metadata> [?? x 56] |
| 23 | +#> # Database: sqlite 3.40.0 [/stornext/Home/data/allstaff/m/mangiola.s/.cache/R/CuratedAtlasQueryR/metadata.sqlite] |
| 24 | +#> .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷ devel…⁸ disease disea…⁹ ethni…˟ ethni…˟ file_id |
| 25 | +#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> |
| 26 | +#> 1 AAACCTGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 27 | +#> 2 AAACCTGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 28 | +#> 3 AAACCTGC… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 29 | +#> 4 AAACCTGC… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 30 | +#> 5 AAACCTGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 31 | +#> 6 AAACCTGT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 32 | +#> 7 AAACCTGT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 33 | +#> 8 AAACGGGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 34 | +#> 9 AAACGGGA… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 35 | +#> 10 AAACGGGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… |
| 36 | +#> # … with more rows, 40 more variables: is_primary_data.x <chr>, organism <chr>, organism_ontology_term_id <chr>, |
| 37 | +#> # sample_placeholder <chr>, sex <chr>, sex_ontology_term_id <chr>, tissue <chr>, tissue_ontology_term_id <chr>, |
| 38 | +#> # tissue_harmonised <chr>, age_days <dbl>, dataset_id <chr>, collection_id <chr>, cell_count <int>, dataset_deployments <chr>, |
| 39 | +#> # is_primary_data.y <chr>, is_valid <int>, linked_genesets <int>, mean_genes_per_cell <dbl>, name <chr>, published <int>, |
| 40 | +#> # revision <int>, schema_version <chr>, tombstone <int>, x_normalization <chr>, created_at.x <dbl>, published_at <dbl>, |
| 41 | +#> # revised_at <dbl>, updated_at.x <dbl>, filename <chr>, filetype <chr>, s3_uri <chr>, user_submitted <int>, created_at.y <dbl>, |
| 42 | +#> # updated_at.y <dbl>, cell_type_harmonised <chr>, confidence_class <dbl>, cell_annotation_azimuth_l2 <chr>, … |
| 43 | +``` |
| 44 | + |
| 45 | +### Explore the tissue |
| 46 | + |
| 47 | +``` r |
| 48 | +get_metadata() |> |
| 49 | + dplyr::distinct(tissue, file_id) |
| 50 | +``` |
| 51 | + |
| 52 | +``` r |
| 53 | +#> # Source: SQL [?? x 2] |
| 54 | +#> # Database: sqlite 3.40.0 [[email protected]:5432/metadata] |
| 55 | +#> # Ordered by: desc(n) |
| 56 | +#> tissue n |
| 57 | +#> <chr> <int64> |
| 58 | +#> 1 blood 47 |
| 59 | +#> 2 heart left ventricle 46 |
| 60 | +#> 3 cortex of kidney 31 |
| 61 | +#> 4 renal medulla 29 |
| 62 | +#> 5 lung 27 |
| 63 | +#> 6 liver 24 |
| 64 | +#> 7 middle temporal gyrus 24 |
| 65 | +#> 8 kidney 19 |
| 66 | +#> 9 intestine 18 |
| 67 | +#> 10 thymus 17 |
| 68 | +#> # … with more rows |
| 69 | +``` |
| 70 | + |
| 71 | +## Download single-cell RNA sequencing counts |
| 72 | + |
| 73 | +### Query raw counts |
| 74 | + |
| 75 | +``` r |
| 76 | + |
| 77 | +single_cell_counts = |
| 78 | + get_metadata() |> |
| 79 | + dplyr::filter( |
| 80 | + ethnicity == "African" & |
| 81 | + stringr::str_like(assay, "%10x%") & |
| 82 | + tissue == "lung parenchyma" & |
| 83 | + stringr::str_like(cell_type, "%CD4%") |
| 84 | + ) |> |
| 85 | + get_SingleCellExperiment() |
| 86 | +#> ℹ Realising metadata. |
| 87 | +#> ℹ Synchronising files |
| 88 | +#> ℹ Reading files. |
| 89 | +#> ℹ Compiling Single Cell Experiment. |
| 90 | + |
| 91 | +single_cell_counts |
| 92 | +#> class: SingleCellExperiment |
| 93 | +#> dim: 60661 1571 |
| 94 | +#> metadata(0): |
| 95 | +#> assays(2): counts cpm |
| 96 | +#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P |
| 97 | +#> rowData names(0): |
| 98 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1 |
| 99 | +#> CATTCGCTCAATACCG_F02526_1 |
| 100 | +#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id |
| 101 | +#> reducedDimNames(0): |
| 102 | +#> mainExpName: NULL |
| 103 | +#> altExpNames(0): |
| 104 | +``` |
| 105 | + |
| 106 | +### Query counts scaled per million |
| 107 | + |
| 108 | +This is helpful if just few genes are of interest, as they can be |
| 109 | +compared across samples. |
| 110 | + |
| 111 | +``` r |
| 112 | +single_cell_counts = |
| 113 | + get_metadata() |> |
| 114 | + dplyr::filter( |
| 115 | + ethnicity == "African" & |
| 116 | + stringr::str_like(assay, "%10x%") & |
| 117 | + tissue == "lung parenchyma" & |
| 118 | + stringr::str_like(cell_type, "%CD4%") |
| 119 | + ) |> |
| 120 | + get_SingleCellExperiment(assays = "cpm") |
| 121 | +#> ℹ Realising metadata. |
| 122 | +#> ℹ Synchronising files |
| 123 | +#> ℹ Reading files. |
| 124 | +#> ℹ Compiling Single Cell Experiment. |
| 125 | + |
| 126 | +single_cell_counts |
| 127 | +#> class: SingleCellExperiment |
| 128 | +#> dim: 60661 1571 |
| 129 | +#> metadata(0): |
| 130 | +#> assays(1): cpm |
| 131 | +#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P |
| 132 | +#> rowData names(0): |
| 133 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1 |
| 134 | +#> CATTCGCTCAATACCG_F02526_1 |
| 135 | +#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id |
| 136 | +#> reducedDimNames(0): |
| 137 | +#> mainExpName: NULL |
| 138 | +#> altExpNames(0): |
| 139 | +``` |
| 140 | + |
| 141 | +### Extract only a subset of genes |
| 142 | + |
| 143 | +``` r |
| 144 | +single_cell_counts = |
| 145 | + get_metadata() |> |
| 146 | + dplyr::filter( |
| 147 | + ethnicity == "African" & |
| 148 | + stringr::str_like(assay, "%10x%") & |
| 149 | + tissue == "lung parenchyma" & |
| 150 | + stringr::str_like(cell_type, "%CD4%") |
| 151 | + ) |> |
| 152 | + get_SingleCellExperiment(assays = "cpm", features = "PUM1") |
| 153 | +#> ℹ Realising metadata. |
| 154 | +#> ℹ Synchronising files |
| 155 | +#> ℹ Reading files. |
| 156 | +#> ℹ Compiling Single Cell Experiment. |
| 157 | + |
| 158 | +single_cell_counts |
| 159 | +#> class: SingleCellExperiment |
| 160 | +#> dim: 1 1571 |
| 161 | +#> metadata(0): |
| 162 | +#> assays(1): cpm |
| 163 | +#> rownames(1): PUM1 |
| 164 | +#> rowData names(0): |
| 165 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1 |
| 166 | +#> CATTCGCTCAATACCG_F02526_1 |
| 167 | +#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id |
| 168 | +#> reducedDimNames(0): |
| 169 | +#> mainExpName: NULL |
| 170 | +#> altExpNames(0): |
| 171 | +``` |
| 172 | + |
| 173 | +### Extract the counts as a Seurat object |
| 174 | + |
| 175 | +This convert the H5 SingleCellExperiment to Seurat so it might take long |
| 176 | +time and occupy a lot of memory dependeing on how many cells you are |
| 177 | +requesting. |
| 178 | + |
| 179 | +``` r |
| 180 | +single_cell_counts = |
| 181 | + get_metadata() |> |
| 182 | + dplyr::filter( |
| 183 | + ethnicity == "African" & |
| 184 | + stringr::str_like(assay, "%10x%") & |
| 185 | + tissue == "lung parenchyma" & |
| 186 | + stringr::str_like(cell_type, "%CD4%") |
| 187 | + ) |> |
| 188 | + get_seurat() |
| 189 | +#> ℹ Realising metadata. |
| 190 | +#> ℹ Synchronising files |
| 191 | +#> ℹ Reading files. |
| 192 | +#> ℹ Compiling Single Cell Experiment. |
| 193 | +#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-') |
| 194 | + |
| 195 | +single_cell_counts |
| 196 | +#> An object of class Seurat |
| 197 | +#> 60661 features across 1571 samples within 1 assay |
| 198 | +#> Active assay: originalexp (60661 features, 0 variable features) |
| 199 | +``` |
| 200 | + |
| 201 | +## Visualise gene transcription |
| 202 | + |
| 203 | +We can gather all natural killer cells and plot the distribution of CD56 |
| 204 | +(NCAM1) across all tissues |
| 205 | + |
| 206 | +``` r |
| 207 | +library(tidySingleCellExperiment) |
| 208 | +library(ggplot2) |
| 209 | + |
| 210 | +get_metadata() |> |
| 211 | + |
| 212 | + # Filter and subset |
| 213 | + filter(cell_type_harmonised=="nk") |> |
| 214 | + select(.cell, file_id_db, disease, file_id, tissue_harmonised) |> |
| 215 | + |
| 216 | + # Get counts per million for NCAM1 gene |
| 217 | + get_SingleCellExperiment(assays = "cpm", features = "NCAM1") |> |
| 218 | + |
| 219 | + # Get transcriptional abundance for plotting with `tidySingleCellExperiment` |
| 220 | + join_features("NCAM1", shape = "wide") |> |
| 221 | + |
| 222 | + # Plot |
| 223 | + ggplot(aes( tissue_harmonised, NCAM1,color = file_id)) + |
| 224 | + geom_jitter(shape=".") + |
| 225 | + |
| 226 | + # Style |
| 227 | + guides(color="none") + |
| 228 | + scale_y_log10() + |
| 229 | + theme_bw() + |
| 230 | + theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1)) |
| 231 | +``` |
| 232 | + |
| 233 | +<img src="../inst/NCAM1_figure.png" width="629" /> |
| 234 | + |
| 235 | +``` r |
| 236 | +sessionInfo() |
| 237 | +#> R version 4.2.0 (2022-04-22) |
| 238 | +#> Platform: x86_64-pc-linux-gnu (64-bit) |
| 239 | +#> Running under: CentOS Linux 7 (Core) |
| 240 | +#> |
| 241 | +#> Matrix products: default |
| 242 | +#> BLAS: /stornext/System/data/apps/R/R-4.2.0/lib64/R/lib/libRblas.so |
| 243 | +#> LAPACK: /stornext/System/data/apps/R/R-4.2.0/lib64/R/lib/libRlapack.so |
| 244 | +#> |
| 245 | +#> locale: |
| 246 | +#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 |
| 247 | +#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C |
| 248 | +#> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C |
| 249 | +#> |
| 250 | +#> attached base packages: |
| 251 | +#> [1] stats graphics grDevices utils datasets methods base |
| 252 | +#> |
| 253 | +#> other attached packages: |
| 254 | +#> [1] stringr_1.5.0 dplyr_1.1.0 CuratedAtlasQueryR_0.1.0 dbplyr_2.3.0 here_1.0.1 |
| 255 | +#> |
| 256 | +#> loaded via a namespace (and not attached): |
| 257 | +#> [1] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2 sp_1.5-1 |
| 258 | +#> [5] splines_4.2.0 listenv_0.9.0 scattermore_0.8 GenomeInfoDb_1.34.7 |
| 259 | +#> [9] ggplot2_3.4.0 inline_0.3.19 digest_0.6.31 htmltools_0.5.4 |
| 260 | +#> [13] fansi_1.0.4 magrittr_2.0.3 memoise_2.0.1 tensor_1.5 |
| 261 | +#> [17] cluster_2.1.4 ROCR_1.0-11 globals_0.16.2 RcppParallel_5.1.6 |
| 262 | +#> [21] matrixStats_0.63.0 spatstat.sparse_3.0-0 prettyunits_1.1.1 colorspace_2.1-0 |
| 263 | +#> [25] blob_1.2.3 ggrepel_0.9.2 xfun_0.36 callr_3.7.3 |
| 264 | +#> [29] crayon_1.5.2 RCurl_1.98-1.9 jsonlite_1.8.4 progressr_0.13.0 |
| 265 | +#> [33] spatstat.data_3.0-0 survival_3.5-0 zoo_1.8-11 glue_1.6.2 |
| 266 | +#> [37] polyclip_1.10-4 gtable_0.3.1 zlibbioc_1.44.0 XVector_0.38.0 |
| 267 | +#> [41] leiden_0.4.3 DelayedArray_0.24.0 V8_4.2.2 pkgbuild_1.4.0 |
| 268 | +#> [45] Rhdf5lib_1.20.0 rstan_2.26.6 SingleCellExperiment_1.20.0 future.apply_1.10.0 |
| 269 | +#> [49] BiocGenerics_0.44.0 HDF5Array_1.26.0 abind_1.4-5 scales_1.2.1 |
| 270 | +#> [53] DBI_1.1.3 spatstat.random_3.0-1 miniUI_0.1.1.1 Rcpp_1.0.10 |
| 271 | +#> [57] viridisLite_0.4.1 xtable_1.8-4 reticulate_1.27 bit_4.0.5 |
| 272 | +#> [61] stats4_4.2.0 StanHeaders_2.26.6 htmlwidgets_1.6.1 httr_1.4.4 |
| 273 | +#> [65] RColorBrewer_1.1-3 ellipsis_0.3.2 Seurat_4.3.0 ica_1.0-3 |
| 274 | +#> [69] pkgconfig_2.0.3 loo_2.5.1 uwot_0.1.14 deldir_1.0-6 |
| 275 | +#> [73] utf8_1.2.2 tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4 |
| 276 | +#> [77] later_1.3.0 munsell_0.5.0 tools_4.2.0 cachem_1.0.6 |
| 277 | +#> [81] cli_3.6.0 generics_0.1.3 RSQLite_2.2.20 ggridges_0.5.4 |
| 278 | +#> [85] evaluate_0.20 fastmap_1.1.0 goftest_1.2-3 yaml_2.3.7 |
| 279 | +#> [89] processx_3.8.0 knitr_1.42 bit64_4.0.5 fitdistrplus_1.1-8 |
| 280 | +#> [93] purrr_1.0.1 RANN_2.6.1 nlme_3.1-161 pbapply_1.7-0 |
| 281 | +#> [97] future_1.30.0 mime_0.12 compiler_4.2.0 rstudioapi_0.14 |
| 282 | +#> [101] plotly_4.10.1 curl_5.0.0 png_0.1-8 spatstat.utils_3.0-1 |
| 283 | +#> [105] tibble_3.1.8 stringi_1.7.12 highr_0.10 ps_1.7.2 |
| 284 | +#> [109] lattice_0.20-45 Matrix_1.5-3 vctrs_0.5.2 pillar_1.8.1 |
| 285 | +#> [113] lifecycle_1.0.3 rhdf5filters_1.10.0 spatstat.geom_3.0-3 lmtest_0.9-40 |
| 286 | +#> [117] RcppAnnoy_0.0.20 data.table_1.14.6 cowplot_1.1.1 bitops_1.0-7 |
| 287 | +#> [121] irlba_2.3.5.1 httpuv_1.6.8 patchwork_1.1.2 GenomicRanges_1.50.2 |
| 288 | +#> [125] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3 |
| 289 | +#> [129] IRanges_2.32.0 parallelly_1.34.0 codetools_0.2-18 MASS_7.3-58.1 |
| 290 | +#> [133] assertthat_0.2.1 rhdf5_2.42.0 SummarizedExperiment_1.28.0 rprojroot_2.0.3 |
| 291 | +#> [137] withr_2.5.0 SeuratObject_4.1.3 sctransform_0.3.5 S4Vectors_0.36.1 |
| 292 | +#> [141] GenomeInfoDbData_1.2.9 parallel_4.2.0 grid_4.2.0 tidyr_1.3.0 |
| 293 | +#> [145] rmarkdown_2.20 MatrixGenerics_1.10.0 Rtsne_0.16 spatstat.explore_3.0-5 |
| 294 | +#> [149] Biobase_2.58.0 shiny_1.7.4 |
| 295 | +``` |
0 commit comments