11CuratedAtlasQueryR
22================
33
4+ ``` r
5+ find_figure <- function (names ){
6+ rprojroot :: find_package_root_file() | >
7+ file.path(" man" , " figures" , names )
8+ }
9+ ```
10+
411<!-- badges: start -->
512
613[ ![ Lifecycle: maturing ] ( https://img.shields.io/badge/lifecycle-maturing-blue.svg )] ( https://www.tidyverse.org/lifecycle/#maturing )
@@ -11,11 +18,9 @@ exploration and retrieval of the harmonised, curated and reannotated
1118CELLxGENE single-cell human cell atlas. Data can be retrieved at cell,
1219sample, or dataset levels based on filtering criteria.
1320
14- <img src =" man/figures/logo.png " width =" 120x " height =" 139px " />
15-
16- <img src =" man/figures/svcf_logo.jpeg " width =" 155x " height =" 58px " /><img src =" man/figures/czi_logo.png " width =" 129px " height =" 58px " /><img src =" man/figures/bioconductor_logo.jpg " width =" 202px " height =" 58px " /><img src =" man/figures/vca_logo.png " width =" 219px " height =" 58px " /><img src =" man/figures/nectar_logo.png " width =" 180px " height =" 58px " />
21+ <img src =" ../man/figures/logo.png " width =" 120x " height =" 139px " />
1722
18- [ website ] ( https://stemangiola.github.io/CuratedAtlasQueryR )
23+ < img src = " ../man/figures/svcf_logo.jpeg " width = " 155x " height = " 58px " />< img src = " ../man/figures/czi_logo.png " width = " 129px " height = " 58px " />< img src = " ../man/figures/bioconductor_logo.jpg " width = " 202px " height = " 58px " />< img src = " ../man/figures/vca_logo.png " width = " 219px " height = " 58px " />< img src = " ../man/figures/nectar_logo.png " width = " 180px " height = " 58px " />
1924
2025# Query interface
2126
@@ -36,52 +41,34 @@ library(CuratedAtlasQueryR)
3641### Load the metadata
3742
3843``` r
39- metadata = get_metadata()
40-
41- metadata
42- # > # Source: table</vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43- # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
44- # > cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
45- # > <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
46- # > 1 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
47- # > 2 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
48- # > 3 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
49- # > 4 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
50- # > 5 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
51- # > 6 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
52- # > 7 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
53- # > 8 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
54- # > 9 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
55- # > 10 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
56- # > # … with more rows, 46 more variables: assay <chr>,
57- # > # assay_ontology_term_id <chr>, file_id_db <chr>,
58- # > # cell_type_ontology_term_id <chr>, development_stage <chr>,
59- # > # development_stage_ontology_term_id <chr>, disease <chr>,
60- # > # disease_ontology_term_id <chr>, ethnicity <chr>,
61- # > # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
62- # > # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
44+ metadata <- get_metadata()
6345```
6446
65- ### Explore the number of datasets per tissue
47+ The ` metadata ` variable can then be re-used for all subsequent queries.
48+
49+ ### Explore the tissue
6650
6751``` r
6852metadata | >
69- dplyr :: distinct(tissue , dataset_id ) | >
70- dplyr :: count(tissue )
71- # > # Source: SQL [?? x 2]
72- # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
73- # > tissue n
74- # > <chr> <dbl>
75- # > 1 cerebellum 3
76- # > 2 telencephalon 2
77- # > 3 heart 3
78- # > 4 intestine 18
79- # > 5 kidney 19
80- # > 6 liver 24
81- # > 7 lung 27
82- # > 8 muscle organ 3
83- # > 9 pancreas 5
84- # > 10 placenta 3
53+ dplyr :: distinct(tissue , file_id )
54+ ```
55+
56+ ``` r
57+ # > # Source: SQL [?? x 2]
58+ # > # Database: sqlite 3.40.0 [[email protected] :5432/metadata]59+ # > # Ordered by: desc(n)
60+ # > tissue n
61+ # > <chr> <int64>
62+ # > 1 blood 47
63+ # > 2 heart left ventricle 46
64+ # > 3 cortex of kidney 31
65+ # > 4 renal medulla 29
66+ # > 5 lung 27
67+ # > 6 liver 24
68+ # > 7 middle temporal gyrus 24
69+ # > 8 kidney 19
70+ # > 9 intestine 18
71+ # > 10 thymus 17
8572# > # … with more rows
8673```
8774
@@ -90,7 +77,6 @@ metadata |>
9077### Query raw counts
9178
9279``` r
93-
9480single_cell_counts =
9581 metadata | >
9682 dplyr :: filter(
@@ -100,8 +86,10 @@ single_cell_counts =
10086 stringr :: str_like(cell_type , " %CD4%" )
10187 ) | >
10288 get_SingleCellExperiment()
89+ # > ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
10390# > ℹ Realising metadata.
10491# > ℹ Synchronising files
92+ # > ℹ Downloading 0 files, totalling 0 GB
10593# > ℹ Reading files.
10694# > ℹ Compiling Single Cell Experiment.
10795
@@ -112,8 +100,8 @@ single_cell_counts
112100# > assays(1): counts
113101# > rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
114102# > rowData names(0):
115- # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
116- # > TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
103+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
104+ # > CATTCGCTCAATACCG_F02526_1
117105# > colData names(56): sample_ cell_type ... updated_at_y original_cell_id
118106# > reducedDimNames(0):
119107# > mainExpName: NULL
@@ -135,8 +123,10 @@ single_cell_counts =
135123 stringr :: str_like(cell_type , " %CD4%" )
136124 ) | >
137125 get_SingleCellExperiment(assays = " cpm" )
126+ # > ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
138127# > ℹ Realising metadata.
139128# > ℹ Synchronising files
129+ # > ℹ Downloading 0 files, totalling 0 GB
140130# > ℹ Reading files.
141131# > ℹ Compiling Single Cell Experiment.
142132
@@ -147,8 +137,8 @@ single_cell_counts
147137# > assays(1): cpm
148138# > rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
149139# > rowData names(0):
150- # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
151- # > TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
140+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
141+ # > CATTCGCTCAATACCG_F02526_1
152142# > colData names(56): sample_ cell_type ... updated_at_y original_cell_id
153143# > reducedDimNames(0):
154144# > mainExpName: NULL
@@ -167,8 +157,10 @@ single_cell_counts =
167157 stringr :: str_like(cell_type , " %CD4%" )
168158 ) | >
169159 get_SingleCellExperiment(assays = " cpm" , features = " PUM1" )
160+ # > ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
170161# > ℹ Realising metadata.
171162# > ℹ Synchronising files
163+ # > ℹ Downloading 0 files, totalling 0 GB
172164# > ℹ Reading files.
173165# > ℹ Compiling Single Cell Experiment.
174166
@@ -179,8 +171,8 @@ single_cell_counts
179171# > assays(1): cpm
180172# > rownames(1): PUM1
181173# > rowData names(0):
182- # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
183- # > TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
174+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
175+ # > CATTCGCTCAATACCG_F02526_1
184176# > colData names(56): sample_ cell_type ... updated_at_y original_cell_id
185177# > reducedDimNames(0):
186178# > mainExpName: NULL
@@ -205,6 +197,7 @@ single_cell_counts =
205197 get_seurat()
206198# > ℹ Realising metadata.
207199# > ℹ Synchronising files
200+ # > ℹ Downloading 0 files, totalling 0 GB
208201# > ℹ Reading files.
209202# > ℹ Compiling Single Cell Experiment.
210203
@@ -270,7 +263,7 @@ metadata |>
270263 geom_jitter(shape = " ." )
271264```
272265
273- <img src =" man/figures/HLA_A_disease_plot.png " width =" 525 " />
266+ <img src =" ../ man/figures/HLA_A_disease_plot.png" width =" 525 " />
274267
275268``` r
276269
@@ -288,7 +281,7 @@ metadata |>
288281 geom_jitter(shape = " ." )
289282```
290283
291- <img src =" man/figures/HLA_A_tissue_plot.png " width =" 525 " />
284+ <img src =" ../ man/figures/HLA_A_tissue_plot.png" width =" 525 " />
292285
293286## Obtain Unharmonised Metadata
294287
@@ -303,59 +296,15 @@ data frame.
303296harmonised <- get_metadata() | > dplyr :: filter(tissue == " kidney blood vessel" )
304297unharmonised <- get_unharmonised_metadata(harmonised )
305298unharmonised
306- # > # A tibble: 4 × 2
307- # > file_id unharmonised
308- # > <chr> <list>
309- # > 1 63523aa3-0d04-4fc6-ac59-5cadd3e73a14 <tbl_dck_[,17]>
310- # > 2 8fee7b82-178b-4c04-bf23-04689415690d <tbl_dck_[,12]>
311- # > 3 dc9d8cdd-29ee-4c44-830c-6559cb3d0af6 <tbl_dck_[,14]>
312- # > 4 f7e94dbb-8638-4616-aaf9-16e2212c369f <tbl_dck_[,14]>
313299```
314300
315301Notice that the columns differ between each dataset’s data frame:
316302
317303``` r
318304dplyr :: pull(unharmonised , unharmonised ) | > head(2 )
319305# > [[1]]
320- # > # Source: SQL [?? x 17]
321- # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
322- # > cell_ file_id donor…¹ donor…² libra…³ mappe…⁴ sampl…⁵ suspe…⁶ suspe…⁷ autho…⁸
323- # > <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
324- # > 1 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
325- # > 2 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
326- # > 3 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
327- # > 4 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
328- # > 5 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
329- # > 6 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
330- # > 7 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
331- # > 8 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
332- # > 9 4602… 63523a… 19 mon… 463181… 671785… GENCOD… 125234… cell c7485e… CD4 T …
333- # > 10 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
334- # > # … with more rows, 7 more variables: cell_state <chr>,
335- # > # reported_diseases <chr>, Short_Sample <chr>, Project <chr>,
336- # > # Experiment <chr>, compartment <chr>, broad_celltype <chr>, and abbreviated
337- # > # variable names ¹donor_age, ²donor_uuid, ³library_uuid,
338- # > # ⁴mapped_reference_annotation, ⁵sample_uuid, ⁶suspension_type,
339- # > # ⁷suspension_uuid, ⁸author_cell_type
340306# >
341307# > [[2]]
342- # > # Source: SQL [?? x 12]
343- # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
344- # > cell_ file_id orig.…¹ nCoun…² nFeat…³ seura…⁴ Project donor…⁵ compa…⁶ broad…⁷
345- # > <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
346- # > 1 1069 8fee7b… 4602ST… 16082 3997 25 Experi… Wilms3 non_PT Pelvic…
347- # > 2 1214 8fee7b… 4602ST… 1037 606 25 Experi… Wilms3 non_PT Pelvic…
348- # > 3 2583 8fee7b… 4602ST… 3028 1361 25 Experi… Wilms3 non_PT Pelvic…
349- # > 4 2655 8fee7b… 4602ST… 1605 859 25 Experi… Wilms3 non_PT Pelvic…
350- # > 5 3609 8fee7b… 4602ST… 1144 682 25 Experi… Wilms3 non_PT Pelvic…
351- # > 6 3624 8fee7b… 4602ST… 1874 963 25 Experi… Wilms3 non_PT Pelvic…
352- # > 7 3946 8fee7b… 4602ST… 1296 755 25 Experi… Wilms3 non_PT Pelvic…
353- # > 8 5163 8fee7b… 4602ST… 11417 3255 25 Experi… Wilms3 non_PT Pelvic…
354- # > 9 5446 8fee7b… 4602ST… 1769 946 19 Experi… Wilms2 lympho… CD4 T …
355- # > 10 6275 8fee7b… 4602ST… 3750 1559 25 Experi… Wilms3 non_PT Pelvic…
356- # > # … with more rows, 2 more variables: author_cell_type <chr>, Sample <chr>, and
357- # > # abbreviated variable names ¹orig.ident, ²nCount_RNA, ³nFeature_RNA,
358- # > # ⁴seurat_clusters, ⁵donor_id, ⁶compartment, ⁷broad_celltype
359308```
360309
361310# Cell metadata
@@ -407,7 +356,7 @@ present in the original CELLxGENE metadata
407356- ` sample_id_db ` : Sample subdivision for internal use
408357- ` file_id_db ` : File subdivision for internal use
409358- ` sample_ ` : Sample ID
410- - ` sample_name ` : How samples were defined
359+ - ` . sample_name` : How samples were defined
411360
412361# RNA abundance
413362
@@ -417,43 +366,3 @@ CELLxGENE include a mix of scales and transformations specified in the
417366` x_normalization ` column.
418367
419368The ` cpm ` assay includes counts per million.
420-
421- # Installation and getting-started problems
422-
423- ** Problem:** Default R cache path including non-standard characters
424- (e.g. dash)
425-
426- ``` r
427- get_metadata()
428-
429- # Error in `db_query_fields.DBIConnection()`:
430- # ! Can't query fields.
431- # Caused by error:
432- # ! Parser Error: syntax error at or near "/"
433- # LINE 2: FROM /Users/bob/Library/Cach...
434- ```
435-
436- ** Solution:** Setup custom cache path (e.g. user home directory)
437-
438- ``` r
439- get_metadata(cache_directory = path.expand(' ~' ))
440- ```
441-
442- ** Problem:** namespace ‘dbplyr’ 2.2.1 is being loaded, but \> = 2.3.0 is
443- required
444-
445- ** Solution:** Install new dbplyr
446-
447- ``` r
448- install.packages(" dbplyr" )
449- ```
450-
451- ------------------------------------------------------------------------
452-
453- This project has been funded by
454-
455- - * Silicon Valley Foundation* CZF2019-002443
456- - * Bioconductor core funding* NIH NHGRI 5U24HG004059-18
457- - * Victoria Cancer Agency* ECRF21036
458- - * Australian National Health and Medical Research Council* 1116955
459- - * The Lorenzo and Pamela Galli Medical Research Trust*
0 commit comments