Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Commit 9ee53a1

Browse files
committed
Rebuild readme
1 parent 019590d commit 9ee53a1

File tree

2 files changed

+54
-140
lines changed

2 files changed

+54
-140
lines changed

README.md

Lines changed: 48 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
CuratedAtlasQueryR
22
================
33

4+
``` r
5+
find_figure <- function(names){
6+
rprojroot::find_package_root_file() |>
7+
file.path("man", "figures", names)
8+
}
9+
```
10+
411
<!-- badges: start -->
512

613
[![Lifecycle:maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
@@ -11,11 +18,9 @@ exploration and retrieval of the harmonised, curated and reannotated
1118
CELLxGENE single-cell human cell atlas. Data can be retrieved at cell,
1219
sample, or dataset levels based on filtering criteria.
1320

14-
<img src="man/figures/logo.png" width="120x" height="139px" />
15-
16-
<img src="man/figures/svcf_logo.jpeg" width="155x" height="58px" /><img src="man/figures/czi_logo.png" width="129px" height="58px" /><img src="man/figures/bioconductor_logo.jpg" width="202px" height="58px" /><img src="man/figures/vca_logo.png" width="219px" height="58px" /><img src="man/figures/nectar_logo.png" width="180px" height="58px" />
21+
<img src="../man/figures/logo.png" width="120x" height="139px" />
1722

18-
[website](https://stemangiola.github.io/CuratedAtlasQueryR)
23+
<img src="../man/figures/svcf_logo.jpeg" width="155x" height="58px" /><img src="../man/figures/czi_logo.png" width="129px" height="58px" /><img src="../man/figures/bioconductor_logo.jpg" width="202px" height="58px" /><img src="../man/figures/vca_logo.png" width="219px" height="58px" /><img src="../man/figures/nectar_logo.png" width="180px" height="58px" />
1924

2025
# Query interface
2126

@@ -36,52 +41,34 @@ library(CuratedAtlasQueryR)
3641
### Load the metadata
3742

3843
``` r
39-
metadata = get_metadata()
40-
41-
metadata
42-
#> # Source: table</vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43-
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
44-
#> cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
45-
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
46-
#> 1 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
47-
#> 2 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
48-
#> 3 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
49-
#> 4 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
50-
#> 5 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
51-
#> 6 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
52-
#> 7 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
53-
#> 8 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
54-
#> 9 AAAC… 689e2f… lumina… lumina… 1 <NA> <NA> <NA> 930938… D17PrP…
55-
#> 10 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
56-
#> # … with more rows, 46 more variables: assay <chr>,
57-
#> # assay_ontology_term_id <chr>, file_id_db <chr>,
58-
#> # cell_type_ontology_term_id <chr>, development_stage <chr>,
59-
#> # development_stage_ontology_term_id <chr>, disease <chr>,
60-
#> # disease_ontology_term_id <chr>, ethnicity <chr>,
61-
#> # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
62-
#> # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
44+
metadata <- get_metadata()
6345
```
6446

65-
### Explore the number of datasets per tissue
47+
The `metadata` variable can then be re-used for all subsequent queries.
48+
49+
### Explore the tissue
6650

6751
``` r
6852
metadata |>
69-
dplyr::distinct(tissue, dataset_id) |>
70-
dplyr::count(tissue)
71-
#> # Source: SQL [?? x 2]
72-
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
73-
#> tissue n
74-
#> <chr> <dbl>
75-
#> 1 cerebellum 3
76-
#> 2 telencephalon 2
77-
#> 3 heart 3
78-
#> 4 intestine 18
79-
#> 5 kidney 19
80-
#> 6 liver 24
81-
#> 7 lung 27
82-
#> 8 muscle organ 3
83-
#> 9 pancreas 5
84-
#> 10 placenta 3
53+
dplyr::distinct(tissue, file_id)
54+
```
55+
56+
``` r
57+
#> # Source: SQL [?? x 2]
58+
#> # Database: sqlite 3.40.0 [[email protected]:5432/metadata]
59+
#> # Ordered by: desc(n)
60+
#> tissue n
61+
#> <chr> <int64>
62+
#> 1 blood 47
63+
#> 2 heart left ventricle 46
64+
#> 3 cortex of kidney 31
65+
#> 4 renal medulla 29
66+
#> 5 lung 27
67+
#> 6 liver 24
68+
#> 7 middle temporal gyrus 24
69+
#> 8 kidney 19
70+
#> 9 intestine 18
71+
#> 10 thymus 17
8572
#> # … with more rows
8673
```
8774

@@ -90,7 +77,6 @@ metadata |>
9077
### Query raw counts
9178

9279
``` r
93-
9480
single_cell_counts =
9581
metadata |>
9682
dplyr::filter(
@@ -100,8 +86,10 @@ single_cell_counts =
10086
stringr::str_like(cell_type, "%CD4%")
10187
) |>
10288
get_SingleCellExperiment()
89+
#> ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
10390
#> ℹ Realising metadata.
10491
#> ℹ Synchronising files
92+
#> ℹ Downloading 0 files, totalling 0 GB
10593
#> ℹ Reading files.
10694
#> ℹ Compiling Single Cell Experiment.
10795

@@ -112,8 +100,8 @@ single_cell_counts
112100
#> assays(1): counts
113101
#> rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
114102
#> rowData names(0):
115-
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
116-
#> TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
103+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
104+
#> CATTCGCTCAATACCG_F02526_1
117105
#> colData names(56): sample_ cell_type ... updated_at_y original_cell_id
118106
#> reducedDimNames(0):
119107
#> mainExpName: NULL
@@ -135,8 +123,10 @@ single_cell_counts =
135123
stringr::str_like(cell_type, "%CD4%")
136124
) |>
137125
get_SingleCellExperiment(assays = "cpm")
126+
#> ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
138127
#> ℹ Realising metadata.
139128
#> ℹ Synchronising files
129+
#> ℹ Downloading 0 files, totalling 0 GB
140130
#> ℹ Reading files.
141131
#> ℹ Compiling Single Cell Experiment.
142132

@@ -147,8 +137,8 @@ single_cell_counts
147137
#> assays(1): cpm
148138
#> rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
149139
#> rowData names(0):
150-
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
151-
#> TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
140+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
141+
#> CATTCGCTCAATACCG_F02526_1
152142
#> colData names(56): sample_ cell_type ... updated_at_y original_cell_id
153143
#> reducedDimNames(0):
154144
#> mainExpName: NULL
@@ -167,8 +157,10 @@ single_cell_counts =
167157
stringr::str_like(cell_type, "%CD4%")
168158
) |>
169159
get_SingleCellExperiment(assays = "cpm", features = "PUM1")
160+
#> ! This function name is deprecated. Please use `get_single_cell_experiment()` instead
170161
#> ℹ Realising metadata.
171162
#> ℹ Synchronising files
163+
#> ℹ Downloading 0 files, totalling 0 GB
172164
#> ℹ Reading files.
173165
#> ℹ Compiling Single Cell Experiment.
174166

@@ -179,8 +171,8 @@ single_cell_counts
179171
#> assays(1): cpm
180172
#> rownames(1): PUM1
181173
#> rowData names(0):
182-
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
183-
#> TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
174+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ... TACAACGTCAGCATTG_SC84_1
175+
#> CATTCGCTCAATACCG_F02526_1
184176
#> colData names(56): sample_ cell_type ... updated_at_y original_cell_id
185177
#> reducedDimNames(0):
186178
#> mainExpName: NULL
@@ -205,6 +197,7 @@ single_cell_counts =
205197
get_seurat()
206198
#> ℹ Realising metadata.
207199
#> ℹ Synchronising files
200+
#> ℹ Downloading 0 files, totalling 0 GB
208201
#> ℹ Reading files.
209202
#> ℹ Compiling Single Cell Experiment.
210203

@@ -270,7 +263,7 @@ metadata |>
270263
geom_jitter(shape=".")
271264
```
272265

273-
<img src="man/figures/HLA_A_disease_plot.png" width="525" />
266+
<img src="../man/figures/HLA_A_disease_plot.png" width="525" />
274267

275268
``` r
276269

@@ -288,7 +281,7 @@ metadata |>
288281
geom_jitter(shape=".")
289282
```
290283

291-
<img src="man/figures/HLA_A_tissue_plot.png" width="525" />
284+
<img src="../man/figures/HLA_A_tissue_plot.png" width="525" />
292285

293286
## Obtain Unharmonised Metadata
294287

@@ -303,59 +296,15 @@ data frame.
303296
harmonised <- get_metadata() |> dplyr::filter(tissue == "kidney blood vessel")
304297
unharmonised <- get_unharmonised_metadata(harmonised)
305298
unharmonised
306-
#> # A tibble: 4 × 2
307-
#> file_id unharmonised
308-
#> <chr> <list>
309-
#> 1 63523aa3-0d04-4fc6-ac59-5cadd3e73a14 <tbl_dck_[,17]>
310-
#> 2 8fee7b82-178b-4c04-bf23-04689415690d <tbl_dck_[,12]>
311-
#> 3 dc9d8cdd-29ee-4c44-830c-6559cb3d0af6 <tbl_dck_[,14]>
312-
#> 4 f7e94dbb-8638-4616-aaf9-16e2212c369f <tbl_dck_[,14]>
313299
```
314300

315301
Notice that the columns differ between each dataset’s data frame:
316302

317303
``` r
318304
dplyr::pull(unharmonised, unharmonised) |> head(2)
319305
#> [[1]]
320-
#> # Source: SQL [?? x 17]
321-
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
322-
#> cell_ file_id donor…¹ donor…² libra…³ mappe…⁴ sampl…⁵ suspe…⁶ suspe…⁷ autho…⁸
323-
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
324-
#> 1 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
325-
#> 2 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
326-
#> 3 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
327-
#> 4 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
328-
#> 5 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
329-
#> 6 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
330-
#> 7 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
331-
#> 8 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
332-
#> 9 4602… 63523a… 19 mon… 463181… 671785… GENCOD… 125234… cell c7485e… CD4 T …
333-
#> 10 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
334-
#> # … with more rows, 7 more variables: cell_state <chr>,
335-
#> # reported_diseases <chr>, Short_Sample <chr>, Project <chr>,
336-
#> # Experiment <chr>, compartment <chr>, broad_celltype <chr>, and abbreviated
337-
#> # variable names ¹​donor_age, ²​donor_uuid, ³​library_uuid,
338-
#> # ⁴​mapped_reference_annotation, ⁵​sample_uuid, ⁶​suspension_type,
339-
#> # ⁷​suspension_uuid, ⁸​author_cell_type
340306
#>
341307
#> [[2]]
342-
#> # Source: SQL [?? x 12]
343-
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
344-
#> cell_ file_id orig.…¹ nCoun…² nFeat…³ seura…⁴ Project donor…⁵ compa…⁶ broad…⁷
345-
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
346-
#> 1 1069 8fee7b… 4602ST… 16082 3997 25 Experi… Wilms3 non_PT Pelvic…
347-
#> 2 1214 8fee7b… 4602ST… 1037 606 25 Experi… Wilms3 non_PT Pelvic…
348-
#> 3 2583 8fee7b… 4602ST… 3028 1361 25 Experi… Wilms3 non_PT Pelvic…
349-
#> 4 2655 8fee7b… 4602ST… 1605 859 25 Experi… Wilms3 non_PT Pelvic…
350-
#> 5 3609 8fee7b… 4602ST… 1144 682 25 Experi… Wilms3 non_PT Pelvic…
351-
#> 6 3624 8fee7b… 4602ST… 1874 963 25 Experi… Wilms3 non_PT Pelvic…
352-
#> 7 3946 8fee7b… 4602ST… 1296 755 25 Experi… Wilms3 non_PT Pelvic…
353-
#> 8 5163 8fee7b… 4602ST… 11417 3255 25 Experi… Wilms3 non_PT Pelvic…
354-
#> 9 5446 8fee7b… 4602ST… 1769 946 19 Experi… Wilms2 lympho… CD4 T …
355-
#> 10 6275 8fee7b… 4602ST… 3750 1559 25 Experi… Wilms3 non_PT Pelvic…
356-
#> # … with more rows, 2 more variables: author_cell_type <chr>, Sample <chr>, and
357-
#> # abbreviated variable names ¹​orig.ident, ²​nCount_RNA, ³​nFeature_RNA,
358-
#> # ⁴​seurat_clusters, ⁵​donor_id, ⁶​compartment, ⁷​broad_celltype
359308
```
360309

361310
# Cell metadata
@@ -407,7 +356,7 @@ present in the original CELLxGENE metadata
407356
- `sample_id_db`: Sample subdivision for internal use
408357
- `file_id_db`: File subdivision for internal use
409358
- `sample_`: Sample ID
410-
- `sample_name`: How samples were defined
359+
- `.sample_name`: How samples were defined
411360

412361
# RNA abundance
413362

@@ -417,43 +366,3 @@ CELLxGENE include a mix of scales and transformations specified in the
417366
`x_normalization` column.
418367

419368
The `cpm` assay includes counts per million.
420-
421-
# Installation and getting-started problems
422-
423-
**Problem:** Default R cache path including non-standard characters
424-
(e.g. dash)
425-
426-
``` r
427-
get_metadata()
428-
429-
# Error in `db_query_fields.DBIConnection()`:
430-
# ! Can't query fields.
431-
# Caused by error:
432-
# ! Parser Error: syntax error at or near "/"
433-
# LINE 2: FROM /Users/bob/Library/Cach...
434-
```
435-
436-
**Solution:** Setup custom cache path (e.g. user home directory)
437-
438-
``` r
439-
get_metadata(cache_directory = path.expand('~'))
440-
```
441-
442-
**Problem:** namespace ‘dbplyr’ 2.2.1 is being loaded, but \>= 2.3.0 is
443-
required
444-
445-
**Solution:** Install new dbplyr
446-
447-
``` r
448-
install.packages("dbplyr")
449-
```
450-
451-
------------------------------------------------------------------------
452-
453-
This project has been funded by
454-
455-
- *Silicon Valley Foundation* CZF2019-002443
456-
- *Bioconductor core funding* NIH NHGRI 5U24HG004059-18
457-
- *Victoria Cancer Agency* ECRF21036
458-
- *Australian National Health and Medical Research Council* 1116955
459-
- *The Lorenzo and Pamela Galli Medical Research Trust*

vignettes/Introduction.Rmd

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,12 @@ vignette: >
99

1010
```{r, eval=FALSE, echo=FALSE}
1111
# Note: knit this to the repo readme file using:
12-
rmarkdown::render("Introduction.Rmd", output_format = "github_document", output_dir = getwd() |> dirname())
12+
rmarkdown::render(
13+
"Introduction.Rmd",
14+
output_file = "README.md",
15+
output_format = "github_document",
16+
output_dir = getwd() |> dirname()
17+
)
1318
```
1419

1520
```{r}

0 commit comments

Comments
 (0)