Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Commit 3e1796a

Browse files
committed
add readme
1 parent 51ad2da commit 3e1796a

File tree

1 file changed

+295
-0
lines changed

1 file changed

+295
-0
lines changed

README.md

Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
CuratedAtlasQueryR
2+
================
3+
4+
#> here() starts at /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/CuratedAtlasQueryR
5+
6+
<img src="../inst/logo.png" width="120px" height="139px" />
7+
8+
## Load the package
9+
10+
``` r
11+
library(CuratedAtlasQueryR)
12+
library(dplyr)
13+
library(stringr)
14+
```
15+
16+
## Load and explore the metadata
17+
18+
### Load the metadata
19+
20+
``` r
21+
get_metadata()
22+
#> # Source: table<metadata> [?? x 56]
23+
#> # Database: sqlite 3.40.0 [/stornext/Home/data/allstaff/m/mangiola.s/.cache/R/CuratedAtlasQueryR/metadata.sqlite]
24+
#> .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷ devel…⁸ disease disea…⁹ ethni…˟ ethni…˟ file_id
25+
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
26+
#> 1 AAACCTGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
27+
#> 2 AAACCTGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
28+
#> 3 AAACCTGC… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
29+
#> 4 AAACCTGC… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
30+
#> 5 AAACCTGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
31+
#> 6 AAACCTGT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
32+
#> 7 AAACCTGT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
33+
#> 8 AAACGGGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
34+
#> 9 AAACGGGA… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
35+
#> 10 AAACGGGA… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626…
36+
#> # … with more rows, 40 more variables: is_primary_data.x <chr>, organism <chr>, organism_ontology_term_id <chr>,
37+
#> # sample_placeholder <chr>, sex <chr>, sex_ontology_term_id <chr>, tissue <chr>, tissue_ontology_term_id <chr>,
38+
#> # tissue_harmonised <chr>, age_days <dbl>, dataset_id <chr>, collection_id <chr>, cell_count <int>, dataset_deployments <chr>,
39+
#> # is_primary_data.y <chr>, is_valid <int>, linked_genesets <int>, mean_genes_per_cell <dbl>, name <chr>, published <int>,
40+
#> # revision <int>, schema_version <chr>, tombstone <int>, x_normalization <chr>, created_at.x <dbl>, published_at <dbl>,
41+
#> # revised_at <dbl>, updated_at.x <dbl>, filename <chr>, filetype <chr>, s3_uri <chr>, user_submitted <int>, created_at.y <dbl>,
42+
#> # updated_at.y <dbl>, cell_type_harmonised <chr>, confidence_class <dbl>, cell_annotation_azimuth_l2 <chr>, …
43+
```
44+
45+
### Explore the tissue
46+
47+
``` r
48+
get_metadata() |>
49+
dplyr::distinct(tissue, file_id)
50+
```
51+
52+
``` r
53+
#> # Source: SQL [?? x 2]
54+
#> # Database: sqlite 3.40.0 [[email protected]:5432/metadata]
55+
#> # Ordered by: desc(n)
56+
#> tissue n
57+
#> <chr> <int64>
58+
#> 1 blood 47
59+
#> 2 heart left ventricle 46
60+
#> 3 cortex of kidney 31
61+
#> 4 renal medulla 29
62+
#> 5 lung 27
63+
#> 6 liver 24
64+
#> 7 middle temporal gyrus 24
65+
#> 8 kidney 19
66+
#> 9 intestine 18
67+
#> 10 thymus 17
68+
#> # … with more rows
69+
```
70+
71+
## Download single-cell RNA sequencing counts
72+
73+
### Query raw counts
74+
75+
``` r
76+
77+
single_cell_counts =
78+
get_metadata() |>
79+
dplyr::filter(
80+
ethnicity == "African" &
81+
stringr::str_like(assay, "%10x%") &
82+
tissue == "lung parenchyma" &
83+
stringr::str_like(cell_type, "%CD4%")
84+
) |>
85+
get_SingleCellExperiment()
86+
#> ℹ Realising metadata.
87+
#> ℹ Synchronising files
88+
#> ℹ Reading files.
89+
#> ℹ Compiling Single Cell Experiment.
90+
91+
single_cell_counts
92+
#> class: SingleCellExperiment
93+
#> dim: 60661 1571
94+
#> metadata(0):
95+
#> assays(2): counts cpm
96+
#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
97+
#> rowData names(0):
98+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
99+
#> CATTCGCTCAATACCG_F02526_1
100+
#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id
101+
#> reducedDimNames(0):
102+
#> mainExpName: NULL
103+
#> altExpNames(0):
104+
```
105+
106+
### Query counts scaled per million
107+
108+
This is helpful if just few genes are of interest, as they can be
109+
compared across samples.
110+
111+
``` r
112+
single_cell_counts =
113+
get_metadata() |>
114+
dplyr::filter(
115+
ethnicity == "African" &
116+
stringr::str_like(assay, "%10x%") &
117+
tissue == "lung parenchyma" &
118+
stringr::str_like(cell_type, "%CD4%")
119+
) |>
120+
get_SingleCellExperiment(assays = "cpm")
121+
#> ℹ Realising metadata.
122+
#> ℹ Synchronising files
123+
#> ℹ Reading files.
124+
#> ℹ Compiling Single Cell Experiment.
125+
126+
single_cell_counts
127+
#> class: SingleCellExperiment
128+
#> dim: 60661 1571
129+
#> metadata(0):
130+
#> assays(1): cpm
131+
#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
132+
#> rowData names(0):
133+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
134+
#> CATTCGCTCAATACCG_F02526_1
135+
#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id
136+
#> reducedDimNames(0):
137+
#> mainExpName: NULL
138+
#> altExpNames(0):
139+
```
140+
141+
### Extract only a subset of genes
142+
143+
``` r
144+
single_cell_counts =
145+
get_metadata() |>
146+
dplyr::filter(
147+
ethnicity == "African" &
148+
stringr::str_like(assay, "%10x%") &
149+
tissue == "lung parenchyma" &
150+
stringr::str_like(cell_type, "%CD4%")
151+
) |>
152+
get_SingleCellExperiment(assays = "cpm", features = "PUM1")
153+
#> ℹ Realising metadata.
154+
#> ℹ Synchronising files
155+
#> ℹ Reading files.
156+
#> ℹ Compiling Single Cell Experiment.
157+
158+
single_cell_counts
159+
#> class: SingleCellExperiment
160+
#> dim: 1 1571
161+
#> metadata(0):
162+
#> assays(1): cpm
163+
#> rownames(1): PUM1
164+
#> rowData names(0):
165+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
166+
#> CATTCGCTCAATACCG_F02526_1
167+
#> colData names(56): sample_id_db .sample ... n_tissue_in_cell_type original_cell_id
168+
#> reducedDimNames(0):
169+
#> mainExpName: NULL
170+
#> altExpNames(0):
171+
```
172+
173+
### Extract the counts as a Seurat object
174+
175+
This convert the H5 SingleCellExperiment to Seurat so it might take long
176+
time and occupy a lot of memory dependeing on how many cells you are
177+
requesting.
178+
179+
``` r
180+
single_cell_counts =
181+
get_metadata() |>
182+
dplyr::filter(
183+
ethnicity == "African" &
184+
stringr::str_like(assay, "%10x%") &
185+
tissue == "lung parenchyma" &
186+
stringr::str_like(cell_type, "%CD4%")
187+
) |>
188+
get_seurat()
189+
#> ℹ Realising metadata.
190+
#> ℹ Synchronising files
191+
#> ℹ Reading files.
192+
#> ℹ Compiling Single Cell Experiment.
193+
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
194+
195+
single_cell_counts
196+
#> An object of class Seurat
197+
#> 60661 features across 1571 samples within 1 assay
198+
#> Active assay: originalexp (60661 features, 0 variable features)
199+
```
200+
201+
## Visualise gene transcription
202+
203+
We can gather all natural killer cells and plot the distribution of CD56
204+
(NCAM1) across all tissues
205+
206+
``` r
207+
library(tidySingleCellExperiment)
208+
library(ggplot2)
209+
210+
get_metadata() |>
211+
212+
# Filter and subset
213+
filter(cell_type_harmonised=="nk") |>
214+
select(.cell, file_id_db, disease, file_id, tissue_harmonised) |>
215+
216+
# Get counts per million for NCAM1 gene
217+
get_SingleCellExperiment(assays = "cpm", features = "NCAM1") |>
218+
219+
# Get transcriptional abundance for plotting with `tidySingleCellExperiment`
220+
join_features("NCAM1", shape = "wide") |>
221+
222+
# Plot
223+
ggplot(aes( tissue_harmonised, NCAM1,color = file_id)) +
224+
geom_jitter(shape=".") +
225+
226+
# Style
227+
guides(color="none") +
228+
scale_y_log10() +
229+
theme_bw() +
230+
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))
231+
```
232+
233+
<img src="../inst/NCAM1_figure.png" width="629" />
234+
235+
``` r
236+
sessionInfo()
237+
#> R version 4.2.0 (2022-04-22)
238+
#> Platform: x86_64-pc-linux-gnu (64-bit)
239+
#> Running under: CentOS Linux 7 (Core)
240+
#>
241+
#> Matrix products: default
242+
#> BLAS: /stornext/System/data/apps/R/R-4.2.0/lib64/R/lib/libRblas.so
243+
#> LAPACK: /stornext/System/data/apps/R/R-4.2.0/lib64/R/lib/libRlapack.so
244+
#>
245+
#> locale:
246+
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
247+
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
248+
#> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
249+
#>
250+
#> attached base packages:
251+
#> [1] stats graphics grDevices utils datasets methods base
252+
#>
253+
#> other attached packages:
254+
#> [1] stringr_1.5.0 dplyr_1.1.0 CuratedAtlasQueryR_0.1.0 dbplyr_2.3.0 here_1.0.1
255+
#>
256+
#> loaded via a namespace (and not attached):
257+
#> [1] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2 sp_1.5-1
258+
#> [5] splines_4.2.0 listenv_0.9.0 scattermore_0.8 GenomeInfoDb_1.34.7
259+
#> [9] ggplot2_3.4.0 inline_0.3.19 digest_0.6.31 htmltools_0.5.4
260+
#> [13] fansi_1.0.4 magrittr_2.0.3 memoise_2.0.1 tensor_1.5
261+
#> [17] cluster_2.1.4 ROCR_1.0-11 globals_0.16.2 RcppParallel_5.1.6
262+
#> [21] matrixStats_0.63.0 spatstat.sparse_3.0-0 prettyunits_1.1.1 colorspace_2.1-0
263+
#> [25] blob_1.2.3 ggrepel_0.9.2 xfun_0.36 callr_3.7.3
264+
#> [29] crayon_1.5.2 RCurl_1.98-1.9 jsonlite_1.8.4 progressr_0.13.0
265+
#> [33] spatstat.data_3.0-0 survival_3.5-0 zoo_1.8-11 glue_1.6.2
266+
#> [37] polyclip_1.10-4 gtable_0.3.1 zlibbioc_1.44.0 XVector_0.38.0
267+
#> [41] leiden_0.4.3 DelayedArray_0.24.0 V8_4.2.2 pkgbuild_1.4.0
268+
#> [45] Rhdf5lib_1.20.0 rstan_2.26.6 SingleCellExperiment_1.20.0 future.apply_1.10.0
269+
#> [49] BiocGenerics_0.44.0 HDF5Array_1.26.0 abind_1.4-5 scales_1.2.1
270+
#> [53] DBI_1.1.3 spatstat.random_3.0-1 miniUI_0.1.1.1 Rcpp_1.0.10
271+
#> [57] viridisLite_0.4.1 xtable_1.8-4 reticulate_1.27 bit_4.0.5
272+
#> [61] stats4_4.2.0 StanHeaders_2.26.6 htmlwidgets_1.6.1 httr_1.4.4
273+
#> [65] RColorBrewer_1.1-3 ellipsis_0.3.2 Seurat_4.3.0 ica_1.0-3
274+
#> [69] pkgconfig_2.0.3 loo_2.5.1 uwot_0.1.14 deldir_1.0-6
275+
#> [73] utf8_1.2.2 tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4
276+
#> [77] later_1.3.0 munsell_0.5.0 tools_4.2.0 cachem_1.0.6
277+
#> [81] cli_3.6.0 generics_0.1.3 RSQLite_2.2.20 ggridges_0.5.4
278+
#> [85] evaluate_0.20 fastmap_1.1.0 goftest_1.2-3 yaml_2.3.7
279+
#> [89] processx_3.8.0 knitr_1.42 bit64_4.0.5 fitdistrplus_1.1-8
280+
#> [93] purrr_1.0.1 RANN_2.6.1 nlme_3.1-161 pbapply_1.7-0
281+
#> [97] future_1.30.0 mime_0.12 compiler_4.2.0 rstudioapi_0.14
282+
#> [101] plotly_4.10.1 curl_5.0.0 png_0.1-8 spatstat.utils_3.0-1
283+
#> [105] tibble_3.1.8 stringi_1.7.12 highr_0.10 ps_1.7.2
284+
#> [109] lattice_0.20-45 Matrix_1.5-3 vctrs_0.5.2 pillar_1.8.1
285+
#> [113] lifecycle_1.0.3 rhdf5filters_1.10.0 spatstat.geom_3.0-3 lmtest_0.9-40
286+
#> [117] RcppAnnoy_0.0.20 data.table_1.14.6 cowplot_1.1.1 bitops_1.0-7
287+
#> [121] irlba_2.3.5.1 httpuv_1.6.8 patchwork_1.1.2 GenomicRanges_1.50.2
288+
#> [125] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3
289+
#> [129] IRanges_2.32.0 parallelly_1.34.0 codetools_0.2-18 MASS_7.3-58.1
290+
#> [133] assertthat_0.2.1 rhdf5_2.42.0 SummarizedExperiment_1.28.0 rprojroot_2.0.3
291+
#> [137] withr_2.5.0 SeuratObject_4.1.3 sctransform_0.3.5 S4Vectors_0.36.1
292+
#> [141] GenomeInfoDbData_1.2.9 parallel_4.2.0 grid_4.2.0 tidyr_1.3.0
293+
#> [145] rmarkdown_2.20 MatrixGenerics_1.10.0 Rtsne_0.16 spatstat.explore_3.0-5
294+
#> [149] Biobase_2.58.0 shiny_1.7.4
295+
```

0 commit comments

Comments
 (0)