@@ -39,8 +39,8 @@ library(CuratedAtlasQueryR)
3939metadata = get_metadata()
4040
4141metadata
42- # > # Source: table</stornext/Home/data/allstaff/m/mangiola.s/. cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43- # > # Database: DuckDB 0.7.0 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.0 /:memory:]
42+ # > # Source: table</vast/scratch/users/milton.m/ cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
43+ # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1 /:memory:]
4444# > cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
4545# > <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
4646# > 1 AAAC… 689e2f… basal … basal_… 1 <NA> <NA> <NA> f297c7… D17PrP…
@@ -69,19 +69,19 @@ metadata |>
6969 dplyr :: distinct(tissue , dataset_id ) | >
7070 dplyr :: count(tissue )
7171# > # Source: SQL [?? x 2]
72- # > # Database: DuckDB 0.7.0 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.0 /:memory:]
73- # > tissue n
74- # > <chr> <dbl>
75- # > 1 peripheral zone of prostate 10
76- # > 2 transition zone of prostate 10
77- # > 3 blood 47
78- # > 4 intestine 18
79- # > 5 middle temporal gyrus 24
80- # > 6 heart left ventricle 46
81- # > 7 apex of heart 16
82- # > 8 heart right ventricle 16
83- # > 9 left cardiac atrium 7
84- # > 10 interventricular septum 16
72+ # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1 /:memory:]
73+ # > tissue n
74+ # > <chr> <dbl>
75+ # > 1 blood 47
76+ # > 2 respiratory airway 16
77+ # > 3 mammary gland epithelial cell (cell culture) 1
78+ # > 4 colon 3
79+ # > 5 intestine 18
80+ # > 6 pleural effusion 11
81+ # > 7 lymph node 15
82+ # > 8 lung 27
83+ # > 9 liver 24
84+ # > 10 axilla 10
8585# > # … with more rows
8686```
8787
@@ -107,10 +107,10 @@ single_cell_counts =
107107
108108single_cell_counts
109109# > class: SingleCellExperiment
110- # > dim: 35615 1571
110+ # > dim: 36229 1571
111111# > metadata(0):
112- # > assays(2 ): counts cpm
113- # > rownames(35615 ): TSPAN6 TNMD ... LNCDAT HRURF
112+ # > assays(1 ): counts
113+ # > rownames(36229 ): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
114114# > rowData names(0):
115115# > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
116116# > TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
@@ -142,10 +142,10 @@ single_cell_counts =
142142
143143single_cell_counts
144144# > class: SingleCellExperiment
145- # > dim: 35615 1571
145+ # > dim: 36229 1571
146146# > metadata(0):
147147# > assays(1): cpm
148- # > rownames(35615 ): TSPAN6 TNMD ... LNCDAT HRURF
148+ # > rownames(36229 ): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
149149# > rowData names(0):
150150# > colnames(1571): ACAGCCGGTCCGTTAA_F02526_1 GGGAATGAGCCCAGCT_F02526_1 ...
151151# > TACAACGTCAGCATTG_SC84_1 CATTCGCTCAATACCG_F02526_1
@@ -207,13 +207,11 @@ single_cell_counts =
207207# > ℹ Synchronising files
208208# > ℹ Reading files.
209209# > ℹ Compiling Single Cell Experiment.
210- # > Warning: Non-unique features (rownames) present in the input matrix, making
211- # > unique
212210
213211single_cell_counts
214212# > An object of class Seurat
215- # > 35615 features across 1571 samples within 1 assay
216- # > Active assay: originalexp (35615 features, 0 variable features)
213+ # > 36229 features across 1571 samples within 1 assay
214+ # > Active assay: originalexp (36229 features, 0 variable features)
217215```
218216
219217## Save your ` SingleCellExperiment `
@@ -292,6 +290,71 @@ metadata |>
292290
293291<img src =" man/figures/HLA_A_tissue_plot.png " width =" 525 " />
294292
293+ ## Obtain Unharmonised Metadata
294+
295+ Various metadata fields are * not* common between datasets, so it does
296+ not make sense for these to live in the main metadata table. However, we
297+ can obtain it using the ` get_unharmonised_metadata() ` function.
298+
299+ Note how this table has additional columns that are not in the normal
300+ metadata:
301+
302+ ``` r
303+ dataset = " 838ea006-2369-4e2c-b426-b2a744a2b02b"
304+ unharmonised_meta = get_unharmonised_metadata(dataset )
305+ unharmonised_tbl = dplyr :: collect(unharmonised_meta [[dataset ]])
306+ unharmonised_tbl
307+ # > # A tibble: 168,860 × 23
308+ # > cell_ file_id Neuro…¹ Class Subcl…² Super…³ Age.a…⁴ Years…⁵ Cogni…⁶ ADNC
309+ # > <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
310+ # > 1 GGACGAAG… 838ea0… FALSE Neur… L4 IT L4 IT_2 90+ ye… 16 to … Dement… High
311+ # > 2 TCACGGGA… 838ea0… FALSE Neur… L4 IT L4 IT_1 90+ ye… 12 to … Dement… Inte…
312+ # > 3 TCAGTTTT… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 16 to … No dem… Low
313+ # > 4 TCAGTCCT… 838ea0… FALSE Neur… L4 IT L4 IT_4 78 to … 16 to … Dement… Inte…
314+ # > 5 AGCCACGC… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 19 to … No dem… Inte…
315+ # > 6 CCTCAACC… 838ea0… TRUE Neur… L4 IT L4 IT_2 Less t… Refere… Refere… Refe…
316+ # > 7 CTCGACAA… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 12 to … No dem… Inte…
317+ # > 8 AGCTACAG… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
318+ # > 9 CTCGAGGG… 838ea0… FALSE Neur… L4 IT L4 IT_2 65 to … 16 to … Dement… High
319+ # > 10 AGTGCCGT… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
320+ # > # … with 168,850 more rows, 13 more variables: Braak.stage <chr>,
321+ # > # Thal.phase <chr>, CERAD.score <chr>, APOE4.status <chr>,
322+ # > # Lewy.body.disease.pathology <chr>, LATE.NC.stage <chr>,
323+ # > # Microinfarct.pathology <chr>, Specimen.ID <chr>, Donor.ID <chr>, PMI <chr>,
324+ # > # Number.of.UMIs <dbl>, Genes.detected <dbl>,
325+ # > # Fraction.mitochrondrial.UMIs <dbl>, and abbreviated variable names
326+ # > # ¹Neurotypical.reference, ²Subclass, ³Supertype, ⁴Age.at.death, …
327+ ```
328+
329+ If we have metadata from the normal metadata table that is from a single
330+ dataset, we can even join this additional metadata into one big data
331+ frame:
332+
333+ ``` r
334+ harmonised_meta = get_metadata() | > dplyr :: filter(file_id == dataset ) | > dplyr :: collect()
335+ dplyr :: left_join(harmonised_meta , unharmonised_tbl , by = c(" file_id" , " cell_" ))
336+ # > # A tibble: 168,860 × 77
337+ # > cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
338+ # > <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
339+ # > 1 GGAC… f63cb4… L2/3-6… neuron 1 <NA> <NA> <NA> 168593… H21.33…
340+ # > 2 TCAC… 0d4d1f… L2/3-6… neuron 1 <NA> <NA> <NA> f7d747… H21.33…
341+ # > 3 TCAG… 3e5a3b… L2/3-6… neuron 1 <NA> <NA> <NA> 3417a9… H20.33…
342+ # > 4 TCAG… 7010a3… L2/3-6… neuron 1 <NA> <NA> <NA> 246a59… H20.33…
343+ # > 5 AGCC… 82bb9a… L2/3-6… neuron 1 <NA> <NA> <NA> 7a8f35… H21.33…
344+ # > 6 CCTC… a233eb… L2/3-6… neuron 1 <NA> <NA> <NA> 188243… H18.30…
345+ # > 7 CTCG… 27f104… L2/3-6… neuron 1 <NA> <NA> <NA> a62943… H20.33…
346+ # > 8 AGCT… 0190a2… L2/3-6… neuron 1 <NA> <NA> <NA> c508a8… H20.33…
347+ # > 9 CTCG… 95d846… L2/3-6… neuron 1 <NA> <NA> <NA> 29285d… H21.33…
348+ # > 10 AGTG… b0e1c5… L2/3-6… neuron 1 <NA> <NA> <NA> cd7823… H21.33…
349+ # > # … with 168,850 more rows, 67 more variables: assay <chr>,
350+ # > # assay_ontology_term_id <chr>, file_id_db <chr>,
351+ # > # cell_type_ontology_term_id <chr>, development_stage <chr>,
352+ # > # development_stage_ontology_term_id <chr>, disease <chr>,
353+ # > # disease_ontology_term_id <chr>, ethnicity <chr>,
354+ # > # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
355+ # > # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
356+ ```
357+
295358# Cell metadata
296359
297360Dataset-specific columns (definitions available at
0 commit comments