@@ -70,18 +70,18 @@ metadata |>
7070 dplyr :: count(tissue )
7171# > # Source: SQL [?? x 2]
7272# > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
73- # > tissue n
74- # > <chr> <dbl>
75- # > 1 blood 47
76- # > 2 respiratory airway 16
77- # > 3 mammary gland epithelial cell (cell culture) 1
78- # > 4 colon 3
79- # > 5 intestine 18
80- # > 6 pleural effusion 11
81- # > 7 lymph node 15
82- # > 8 lung 27
83- # > 9 liver 24
84- # > 10 axilla 10
73+ # > tissue n
74+ # > <chr> <dbl>
75+ # > 1 cerebellum 3
76+ # > 2 telencephalon 2
77+ # > 3 heart 3
78+ # > 4 intestine 18
79+ # > 5 kidney 19
80+ # > 6 liver 24
81+ # > 7 lung 27
82+ # > 8 muscle organ 3
83+ # > 9 pancreas 5
84+ # > 10 placenta 3
8585# > # … with more rows
8686```
8787
@@ -294,65 +294,68 @@ metadata |>
294294
295295Various metadata fields are * not* common between datasets, so it does
296296not make sense for these to live in the main metadata table. However, we
297- can obtain it using the ` get_unharmonised_metadata() ` function.
298-
299- Note how this table has additional columns that are not in the normal
300- metadata:
297+ can obtain it using the ` get_unharmonised_metadata() ` function. This
298+ function returns a data frame with one row per dataset, including the
299+ ` unharmonised ` column which contains unharmnised metadata as a nested
300+ data frame.
301301
302302``` r
303- dataset = " 838ea006-2369-4e2c-b426-b2a744a2b02b"
304- unharmonised_meta = get_unharmonised_metadata(dataset )
305- unharmonised_tbl = dplyr :: collect(unharmonised_meta [[dataset ]])
306- unharmonised_tbl
307- # > # A tibble: 168,860 × 23
308- # > cell_ file_id Neuro…¹ Class Subcl…² Super…³ Age.a…⁴ Years…⁵ Cogni…⁶ ADNC
309- # > <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
310- # > 1 GGACGAAG… 838ea0… FALSE Neur… L4 IT L4 IT_2 90+ ye… 16 to … Dement… High
311- # > 2 TCACGGGA… 838ea0… FALSE Neur… L4 IT L4 IT_1 90+ ye… 12 to … Dement… Inte…
312- # > 3 TCAGTTTT… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 16 to … No dem… Low
313- # > 4 TCAGTCCT… 838ea0… FALSE Neur… L4 IT L4 IT_4 78 to … 16 to … Dement… Inte…
314- # > 5 AGCCACGC… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 19 to … No dem… Inte…
315- # > 6 CCTCAACC… 838ea0… TRUE Neur… L4 IT L4 IT_2 Less t… Refere… Refere… Refe…
316- # > 7 CTCGACAA… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 12 to … No dem… Inte…
317- # > 8 AGCTACAG… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
318- # > 9 CTCGAGGG… 838ea0… FALSE Neur… L4 IT L4 IT_2 65 to … 16 to … Dement… High
319- # > 10 AGTGCCGT… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
320- # > # … with 168,850 more rows, 13 more variables: Braak.stage <chr>,
321- # > # Thal.phase <chr>, CERAD.score <chr>, APOE4.status <chr>,
322- # > # Lewy.body.disease.pathology <chr>, LATE.NC.stage <chr>,
323- # > # Microinfarct.pathology <chr>, Specimen.ID <chr>, Donor.ID <chr>, PMI <chr>,
324- # > # Number.of.UMIs <dbl>, Genes.detected <dbl>,
325- # > # Fraction.mitochrondrial.UMIs <dbl>, and abbreviated variable names
326- # > # ¹Neurotypical.reference, ²Subclass, ³Supertype, ⁴Age.at.death, …
303+ harmonised <- get_metadata() | > dplyr :: filter(tissue == " kidney blood vessel" )
304+ unharmonised <- get_unharmonised_metadata(harmonised )
305+ unharmonised
306+ # > # A tibble: 4 × 2
307+ # > file_id unharmonised
308+ # > <chr> <list>
309+ # > 1 63523aa3-0d04-4fc6-ac59-5cadd3e73a14 <tbl_dck_[,17]>
310+ # > 2 8fee7b82-178b-4c04-bf23-04689415690d <tbl_dck_[,12]>
311+ # > 3 dc9d8cdd-29ee-4c44-830c-6559cb3d0af6 <tbl_dck_[,14]>
312+ # > 4 f7e94dbb-8638-4616-aaf9-16e2212c369f <tbl_dck_[,14]>
327313```
328314
329- If we have metadata from the normal metadata table that is from a single
330- dataset, we can even join this additional metadata into one big data
331- frame:
315+ Notice that the columns differ between each dataset’s data frame:
332316
333317``` r
334- harmonised_meta = get_metadata() | > dplyr :: filter(file_id == dataset ) | > dplyr :: collect()
335- dplyr :: left_join(harmonised_meta , unharmonised_tbl , by = c(" file_id" , " cell_" ))
336- # > # A tibble: 168,860 × 77
337- # > cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
338- # > <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
339- # > 1 GGAC… f63cb4… L2/3-6… neuron 1 <NA> <NA> <NA> 168593… H21.33…
340- # > 2 TCAC… 0d4d1f… L2/3-6… neuron 1 <NA> <NA> <NA> f7d747… H21.33…
341- # > 3 TCAG… 3e5a3b… L2/3-6… neuron 1 <NA> <NA> <NA> 3417a9… H20.33…
342- # > 4 TCAG… 7010a3… L2/3-6… neuron 1 <NA> <NA> <NA> 246a59… H20.33…
343- # > 5 AGCC… 82bb9a… L2/3-6… neuron 1 <NA> <NA> <NA> 7a8f35… H21.33…
344- # > 6 CCTC… a233eb… L2/3-6… neuron 1 <NA> <NA> <NA> 188243… H18.30…
345- # > 7 CTCG… 27f104… L2/3-6… neuron 1 <NA> <NA> <NA> a62943… H20.33…
346- # > 8 AGCT… 0190a2… L2/3-6… neuron 1 <NA> <NA> <NA> c508a8… H20.33…
347- # > 9 CTCG… 95d846… L2/3-6… neuron 1 <NA> <NA> <NA> 29285d… H21.33…
348- # > 10 AGTG… b0e1c5… L2/3-6… neuron 1 <NA> <NA> <NA> cd7823… H21.33…
349- # > # … with 168,850 more rows, 67 more variables: assay <chr>,
350- # > # assay_ontology_term_id <chr>, file_id_db <chr>,
351- # > # cell_type_ontology_term_id <chr>, development_stage <chr>,
352- # > # development_stage_ontology_term_id <chr>, disease <chr>,
353- # > # disease_ontology_term_id <chr>, ethnicity <chr>,
354- # > # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
355- # > # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
318+ dplyr :: pull(unharmonised , unharmonised ) | > head(2 )
319+ # > [[1]]
320+ # > # Source: SQL [?? x 17]
321+ # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
322+ # > cell_ file_id donor…¹ donor…² libra…³ mappe…⁴ sampl…⁵ suspe…⁶ suspe…⁷ autho…⁸
323+ # > <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
324+ # > 1 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
325+ # > 2 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
326+ # > 3 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
327+ # > 4 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
328+ # > 5 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
329+ # > 6 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
330+ # > 7 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
331+ # > 8 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
332+ # > 9 4602… 63523a… 19 mon… 463181… 671785… GENCOD… 125234… cell c7485e… CD4 T …
333+ # > 10 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
334+ # > # … with more rows, 7 more variables: cell_state <chr>,
335+ # > # reported_diseases <chr>, Short_Sample <chr>, Project <chr>,
336+ # > # Experiment <chr>, compartment <chr>, broad_celltype <chr>, and abbreviated
337+ # > # variable names ¹donor_age, ²donor_uuid, ³library_uuid,
338+ # > # ⁴mapped_reference_annotation, ⁵sample_uuid, ⁶suspension_type,
339+ # > # ⁷suspension_uuid, ⁸author_cell_type
340+ # >
341+ # > [[2]]
342+ # > # Source: SQL [?? x 12]
343+ # > # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
344+ # > cell_ file_id orig.…¹ nCoun…² nFeat…³ seura…⁴ Project donor…⁵ compa…⁶ broad…⁷
345+ # > <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
346+ # > 1 1069 8fee7b… 4602ST… 16082 3997 25 Experi… Wilms3 non_PT Pelvic…
347+ # > 2 1214 8fee7b… 4602ST… 1037 606 25 Experi… Wilms3 non_PT Pelvic…
348+ # > 3 2583 8fee7b… 4602ST… 3028 1361 25 Experi… Wilms3 non_PT Pelvic…
349+ # > 4 2655 8fee7b… 4602ST… 1605 859 25 Experi… Wilms3 non_PT Pelvic…
350+ # > 5 3609 8fee7b… 4602ST… 1144 682 25 Experi… Wilms3 non_PT Pelvic…
351+ # > 6 3624 8fee7b… 4602ST… 1874 963 25 Experi… Wilms3 non_PT Pelvic…
352+ # > 7 3946 8fee7b… 4602ST… 1296 755 25 Experi… Wilms3 non_PT Pelvic…
353+ # > 8 5163 8fee7b… 4602ST… 11417 3255 25 Experi… Wilms3 non_PT Pelvic…
354+ # > 9 5446 8fee7b… 4602ST… 1769 946 19 Experi… Wilms2 lympho… CD4 T …
355+ # > 10 6275 8fee7b… 4602ST… 3750 1559 25 Experi… Wilms3 non_PT Pelvic…
356+ # > # … with more rows, 2 more variables: author_cell_type <chr>, Sample <chr>, and
357+ # > # abbreviated variable names ¹orig.ident, ²nCount_RNA, ³nFeature_RNA,
358+ # > # ⁴seurat_clusters, ⁵donor_id, ⁶compartment, ⁷broad_celltype
356359```
357360
358361# Cell metadata
0 commit comments