Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Commit 98748c9

Browse files
authored
Merge pull request #98 from stemangiola/fix-95
Update readme
2 parents 941d3f6 + a8864a2 commit 98748c9

File tree

1 file changed

+87
-24
lines changed

1 file changed

+87
-24
lines changed

README.md

Lines changed: 87 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ library(CuratedAtlasQueryR)
3939
metadata = get_metadata()
4040

4141
metadata
42-
#> # Source: table</stornext/Home/data/allstaff/m/mangiola.s/.cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43-
#> # Database: DuckDB 0.7.0 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.0/:memory:]
42+
#> # Source: table</vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43+
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
4444
#> cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
4545
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
4646
#> 1 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
@@ -69,19 +69,19 @@ metadata |>
6969
dplyr::distinct(tissue, dataset_id) |>
7070
dplyr::count(tissue)
7171
#> # Source: SQL [?? x 2]
72-
#> # Database: DuckDB 0.7.0 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.0/:memory:]
73-
#> tissue n
74-
#> <chr> <dbl>
75-
#> 1 peripheral zone of prostate 10
76-
#> 2 transition zone of prostate 10
77-
#> 3 blood 47
78-
#> 4 intestine 18
79-
#> 5 middle temporal gyrus 24
80-
#> 6 heart left ventricle 46
81-
#> 7 apex of heart 16
82-
#> 8 heart right ventricle 16
83-
#> 9 left cardiac atrium 7
84-
#> 10 interventricular septum 16
72+
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
73+
#> tissue n
74+
#> <chr> <dbl>
75+
#> 1 blood 47
76+
#> 2 respiratory airway 16
77+
#> 3 mammary gland epithelial cell (cell culture) 1
78+
#> 4 colon 3
79+
#> 5 intestine 18
80+
#> 6 pleural effusion 11
81+
#> 7 lymph node 15
82+
#> 8 lung 27
83+
#> 9 liver 24
84+
#> 10 axilla 10
8585
#> # … with more rows
8686
```
8787

@@ -107,10 +107,10 @@ single_cell_counts =
107107

108108
single_cell_counts
109109
#> class: SingleCellExperiment
110-
#> dim: 35615 1571
110+
#> dim: 36229 1571
111111
#> metadata(0):
112-
#> assays(2): counts cpm
113-
#> rownames(35615): TSPAN6 TNMD ... LNCDAT HRURF
112+
#> assays(1): counts
113+
#> rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
114114
#> rowData names(0):
115115
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
116116
#> TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
@@ -142,10 +142,10 @@ single_cell_counts =
142142

143143
single_cell_counts
144144
#> class: SingleCellExperiment
145-
#> dim: 35615 1571
145+
#> dim: 36229 1571
146146
#> metadata(0):
147147
#> assays(1): cpm
148-
#> rownames(35615): TSPAN6 TNMD ... LNCDAT HRURF
148+
#> rownames(36229): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
149149
#> rowData names(0):
150150
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
151151
#> TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
@@ -207,13 +207,11 @@ single_cell_counts =
207207
#> ℹ Synchronising files
208208
#> ℹ Reading files.
209209
#> ℹ Compiling Single Cell Experiment.
210-
#> Warning: Non-unique features (rownames) present in the input matrix, making
211-
#> unique
212210

213211
single_cell_counts
214212
#> An object of class Seurat
215-
#> 35615 features across 1571 samples within 1 assay
216-
#> Active assay: originalexp (35615 features, 0 variable features)
213+
#> 36229 features across 1571 samples within 1 assay
214+
#> Active assay: originalexp (36229 features, 0 variable features)
217215
```
218216

219217
## Save your `SingleCellExperiment`
@@ -292,6 +290,71 @@ metadata |>
292290

293291
<img src="man/figures/HLA_A_tissue_plot.png" width="525" />
294292

293+
## Obtain Unharmonised Metadata
294+
295+
Various metadata fields are *not* common between datasets, so it does
296+
not make sense for these to live in the main metadata table. However, we
297+
can obtain it using the `get_unharmonised_metadata()` function.
298+
299+
Note how this table has additional columns that are not in the normal
300+
metadata:
301+
302+
``` r
303+
dataset = "838ea006-2369-4e2c-b426-b2a744a2b02b"
304+
unharmonised_meta = get_unharmonised_metadata(dataset)
305+
unharmonised_tbl = dplyr::collect(unharmonised_meta[[dataset]])
306+
unharmonised_tbl
307+
#> # A tibble: 168,860 × 23
308+
#> cell_ file_id Neuro…¹ Class Subcl…² Super…³ Age.a…⁴ Years…⁵ Cogni…⁶ ADNC
309+
#> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
310+
#> 1 GGACGAAG… 838ea0… FALSE Neur… L4 IT L4 IT_2 90+ ye… 16 to … Dement… High
311+
#> 2 TCACGGGA… 838ea0… FALSE Neur… L4 IT L4 IT_1 90+ ye… 12 to … Dement… Inte…
312+
#> 3 TCAGTTTT… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 16 to … No dem… Low
313+
#> 4 TCAGTCCT… 838ea0… FALSE Neur… L4 IT L4 IT_4 78 to … 16 to … Dement… Inte…
314+
#> 5 AGCCACGC… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 19 to … No dem… Inte…
315+
#> 6 CCTCAACC… 838ea0… TRUE Neur… L4 IT L4 IT_2 Less t… Refere… Refere… Refe…
316+
#> 7 CTCGACAA… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 12 to … No dem… Inte…
317+
#> 8 AGCTACAG… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
318+
#> 9 CTCGAGGG… 838ea0… FALSE Neur… L4 IT L4 IT_2 65 to … 16 to … Dement… High
319+
#> 10 AGTGCCGT… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
320+
#> # … with 168,850 more rows, 13 more variables: Braak.stage <chr>,
321+
#> # Thal.phase <chr>, CERAD.score <chr>, APOE4.status <chr>,
322+
#> # Lewy.body.disease.pathology <chr>, LATE.NC.stage <chr>,
323+
#> # Microinfarct.pathology <chr>, Specimen.ID <chr>, Donor.ID <chr>, PMI <chr>,
324+
#> # Number.of.UMIs <dbl>, Genes.detected <dbl>,
325+
#> # Fraction.mitochrondrial.UMIs <dbl>, and abbreviated variable names
326+
#> # ¹​Neurotypical.reference, ²​Subclass, ³​Supertype, ⁴​Age.at.death, …
327+
```
328+
329+
If we have metadata from the normal metadata table that is from a single
330+
dataset, we can even join this additional metadata into one big data
331+
frame:
332+
333+
``` r
334+
harmonised_meta = get_metadata() |> dplyr::filter(file_id == dataset) |> dplyr::collect()
335+
dplyr::left_join(harmonised_meta, unharmonised_tbl, by=c("file_id", "cell_"))
336+
#> # A tibble: 168,860 × 77
337+
#> cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
338+
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
339+
#> 1 GGAC… f63cb4… L2/3-6… neuron 1 <NA> <NA> <NA> 168593… H21.33…
340+
#> 2 TCAC… 0d4d1f… L2/3-6… neuron 1 <NA> <NA> <NA> f7d747… H21.33…
341+
#> 3 TCAG… 3e5a3b… L2/3-6… neuron 1 <NA> <NA> <NA> 3417a9… H20.33…
342+
#> 4 TCAG… 7010a3… L2/3-6… neuron 1 <NA> <NA> <NA> 246a59… H20.33…
343+
#> 5 AGCC… 82bb9a… L2/3-6… neuron 1 <NA> <NA> <NA> 7a8f35… H21.33…
344+
#> 6 CCTC… a233eb… L2/3-6… neuron 1 <NA> <NA> <NA> 188243… H18.30…
345+
#> 7 CTCG… 27f104… L2/3-6… neuron 1 <NA> <NA> <NA> a62943… H20.33…
346+
#> 8 AGCT… 0190a2… L2/3-6… neuron 1 <NA> <NA> <NA> c508a8… H20.33…
347+
#> 9 CTCG… 95d846… L2/3-6… neuron 1 <NA> <NA> <NA> 29285d… H21.33…
348+
#> 10 AGTG… b0e1c5… L2/3-6… neuron 1 <NA> <NA> <NA> cd7823… H21.33…
349+
#> # … with 168,850 more rows, 67 more variables: assay <chr>,
350+
#> # assay_ontology_term_id <chr>, file_id_db <chr>,
351+
#> # cell_type_ontology_term_id <chr>, development_stage <chr>,
352+
#> # development_stage_ontology_term_id <chr>, disease <chr>,
353+
#> # disease_ontology_term_id <chr>, ethnicity <chr>,
354+
#> # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
355+
#> # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
356+
```
357+
295358
# Cell metadata
296359

297360
Dataset-specific columns (definitions available at

0 commit comments

Comments
 (0)