Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Commit 626d614

Browse files
committed
Rebuild readme, fix example error
1 parent dfdfc77 commit 626d614

File tree

2 files changed

+76
-66
lines changed

2 files changed

+76
-66
lines changed

R/unharmonised.R

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,13 @@
2525
#' @return A named list, where each name is a dataset file ID, and each value is
2626
#' a "lazy data frame", ie a `tbl`.
2727
#' @examples
28+
#' \dontrun{
2829
#' dataset = "838ea006-2369-4e2c-b426-b2a744a2b02b"
2930
#' harmonised_meta = get_metadata() |> dplyr::filter(file_id == dataset) |> dplyr::collect()
3031
#' unharmonised_meta = get_unharmonised_dataset(dataset)
3132
#' unharmonised_tbl = dplyr::collect(unharmonised_meta[[dataset]])
3233
#' dplyr::left_join(harmonised_meta, unharmonised_tbl, by=c("file_id", "cell_"))
34+
#' }
3335
get_unharmonised_dataset = function(
3436
dataset_id,
3537
cells = NULL,
@@ -60,6 +62,7 @@ get_unharmonised_dataset = function(
6062
#' @export
6163
#' @importFrom dplyr group_by summarise filter collect
6264
#' @importFrom rlang .data
65+
#' @importFrom dbplyr remote_con
6366
#' @examples
6467
#' harmonised <- get_metadata() |> dplyr::filter(tissue == "kidney blood vessel")
6568
#' unharmonised <- get_unharmonised_metadata(harmonised)
@@ -69,7 +72,11 @@ get_unharmonised_metadata = function(metadata, ...){
6972
collect() |>
7073
group_by(.data$file_id) |>
7174
summarise(
72-
unharmonised = list(dataset_id=.data$file_id[[1]], cells=.data$cell_, conn=dbplyr::remote_con(metadata)) |>
75+
unharmonised = list(
76+
dataset_id=.data$file_id[[1]],
77+
cells=.data$cell_,
78+
conn=remote_con(metadata)
79+
) |>
7380
c(args) |>
7481
do.call(get_unharmonised_dataset, args=_) |>
7582
list()

README.md

Lines changed: 68 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -70,18 +70,18 @@ metadata |>
7070
dplyr::count(tissue)
7171
#> # Source: SQL [?? x 2]
7272
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
73-
#> tissue n
74-
#> <chr> <dbl>
75-
#> 1 blood 47
76-
#> 2 respiratory airway 16
77-
#> 3 mammary gland epithelial cell (cell culture) 1
78-
#> 4 colon 3
79-
#> 5 intestine 18
80-
#> 6 pleural effusion 11
81-
#> 7 lymph node 15
82-
#> 8 lung 27
83-
#> 9 liver 24
84-
#> 10 axilla 10
73+
#> tissue n
74+
#> <chr> <dbl>
75+
#> 1 cerebellum 3
76+
#> 2 telencephalon 2
77+
#> 3 heart 3
78+
#> 4 intestine 18
79+
#> 5 kidney 19
80+
#> 6 liver 24
81+
#> 7 lung 27
82+
#> 8 muscle organ 3
83+
#> 9 pancreas 5
84+
#> 10 placenta 3
8585
#> # … with more rows
8686
```
8787

@@ -294,65 +294,68 @@ metadata |>
294294

295295
Various metadata fields are *not* common between datasets, so it does
296296
not make sense for these to live in the main metadata table. However, we
297-
can obtain it using the `get_unharmonised_metadata()` function.
298-
299-
Note how this table has additional columns that are not in the normal
300-
metadata:
297+
can obtain it using the `get_unharmonised_metadata()` function. This
298+
function returns a data frame with one row per dataset, including the
299+
`unharmonised` column which contains unharmnised metadata as a nested
300+
data frame.
301301

302302
``` r
303-
dataset = "838ea006-2369-4e2c-b426-b2a744a2b02b"
304-
unharmonised_meta = get_unharmonised_metadata(dataset)
305-
unharmonised_tbl = dplyr::collect(unharmonised_meta[[dataset]])
306-
unharmonised_tbl
307-
#> # A tibble: 168,860 × 23
308-
#> cell_ file_id Neuro…¹ Class Subcl…² Super…³ Age.a…⁴ Years…⁵ Cogni…⁶ ADNC
309-
#> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
310-
#> 1 GGACGAAG… 838ea0… FALSE Neur… L4 IT L4 IT_2 90+ ye… 16 to … Dement… High
311-
#> 2 TCACGGGA… 838ea0… FALSE Neur… L4 IT L4 IT_1 90+ ye… 12 to … Dement… Inte…
312-
#> 3 TCAGTTTT… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 16 to … No dem… Low
313-
#> 4 TCAGTCCT… 838ea0… FALSE Neur… L4 IT L4 IT_4 78 to … 16 to … Dement… Inte…
314-
#> 5 AGCCACGC… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 19 to … No dem… Inte…
315-
#> 6 CCTCAACC… 838ea0… TRUE Neur… L4 IT L4 IT_2 Less t… Refere… Refere… Refe…
316-
#> 7 CTCGACAA… 838ea0… FALSE Neur… L4 IT L4 IT_2 78 to … 12 to … No dem… Inte…
317-
#> 8 AGCTACAG… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
318-
#> 9 CTCGAGGG… 838ea0… FALSE Neur… L4 IT L4 IT_2 65 to … 16 to … Dement… High
319-
#> 10 AGTGCCGT… 838ea0… FALSE Neur… L4 IT L4 IT_4 90+ ye… 16 to … Dement… High
320-
#> # … with 168,850 more rows, 13 more variables: Braak.stage <chr>,
321-
#> # Thal.phase <chr>, CERAD.score <chr>, APOE4.status <chr>,
322-
#> # Lewy.body.disease.pathology <chr>, LATE.NC.stage <chr>,
323-
#> # Microinfarct.pathology <chr>, Specimen.ID <chr>, Donor.ID <chr>, PMI <chr>,
324-
#> # Number.of.UMIs <dbl>, Genes.detected <dbl>,
325-
#> # Fraction.mitochrondrial.UMIs <dbl>, and abbreviated variable names
326-
#> # ¹​Neurotypical.reference, ²​Subclass, ³​Supertype, ⁴​Age.at.death, …
303+
harmonised <- get_metadata() |> dplyr::filter(tissue == "kidney blood vessel")
304+
unharmonised <- get_unharmonised_metadata(harmonised)
305+
unharmonised
306+
#> # A tibble: 4 × 2
307+
#> file_id unharmonised
308+
#> <chr> <list>
309+
#> 1 63523aa3-0d04-4fc6-ac59-5cadd3e73a14 <tbl_dck_[,17]>
310+
#> 2 8fee7b82-178b-4c04-bf23-04689415690d <tbl_dck_[,12]>
311+
#> 3 dc9d8cdd-29ee-4c44-830c-6559cb3d0af6 <tbl_dck_[,14]>
312+
#> 4 f7e94dbb-8638-4616-aaf9-16e2212c369f <tbl_dck_[,14]>
327313
```
328314

329-
If we have metadata from the normal metadata table that is from a single
330-
dataset, we can even join this additional metadata into one big data
331-
frame:
315+
Notice that the columns differ between each dataset’s data frame:
332316

333317
``` r
334-
harmonised_meta = get_metadata() |> dplyr::filter(file_id == dataset) |> dplyr::collect()
335-
dplyr::left_join(harmonised_meta, unharmonised_tbl, by=c("file_id", "cell_"))
336-
#> # A tibble: 168,860 × 77
337-
#> cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
338-
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
339-
#> 1 GGAC… f63cb4… L2/3-6… neuron 1 <NA> <NA> <NA> 168593… H21.33…
340-
#> 2 TCAC… 0d4d1f… L2/3-6… neuron 1 <NA> <NA> <NA> f7d747… H21.33…
341-
#> 3 TCAG… 3e5a3b… L2/3-6… neuron 1 <NA> <NA> <NA> 3417a9… H20.33…
342-
#> 4 TCAG… 7010a3… L2/3-6… neuron 1 <NA> <NA> <NA> 246a59… H20.33…
343-
#> 5 AGCC… 82bb9a… L2/3-6… neuron 1 <NA> <NA> <NA> 7a8f35… H21.33…
344-
#> 6 CCTC… a233eb… L2/3-6… neuron 1 <NA> <NA> <NA> 188243… H18.30…
345-
#> 7 CTCG… 27f104… L2/3-6… neuron 1 <NA> <NA> <NA> a62943… H20.33…
346-
#> 8 AGCT… 0190a2… L2/3-6… neuron 1 <NA> <NA> <NA> c508a8… H20.33…
347-
#> 9 CTCG… 95d846… L2/3-6… neuron 1 <NA> <NA> <NA> 29285d… H21.33…
348-
#> 10 AGTG… b0e1c5… L2/3-6… neuron 1 <NA> <NA> <NA> cd7823… H21.33…
349-
#> # … with 168,850 more rows, 67 more variables: assay <chr>,
350-
#> # assay_ontology_term_id <chr>, file_id_db <chr>,
351-
#> # cell_type_ontology_term_id <chr>, development_stage <chr>,
352-
#> # development_stage_ontology_term_id <chr>, disease <chr>,
353-
#> # disease_ontology_term_id <chr>, ethnicity <chr>,
354-
#> # ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
355-
#> # is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …
318+
dplyr::pull(unharmonised, unharmonised) |> head(2)
319+
#> [[1]]
320+
#> # Source: SQL [?? x 17]
321+
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
322+
#> cell_ file_id donor…¹ donor…² libra…³ mappe…⁴ sampl…⁵ suspe…⁶ suspe…⁷ autho…⁸
323+
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
324+
#> 1 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
325+
#> 2 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
326+
#> 3 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
327+
#> 4 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
328+
#> 5 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
329+
#> 6 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
330+
#> 7 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
331+
#> 8 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
332+
#> 9 4602… 63523a… 19 mon… 463181… 671785… GENCOD… 125234… cell c7485e… CD4 T …
333+
#> 10 4602… 63523a… 27 mon… a8536b… 5ddaea… GENCOD… 61bf84… cell d8a44f… Pelvic…
334+
#> # … with more rows, 7 more variables: cell_state <chr>,
335+
#> # reported_diseases <chr>, Short_Sample <chr>, Project <chr>,
336+
#> # Experiment <chr>, compartment <chr>, broad_celltype <chr>, and abbreviated
337+
#> # variable names ¹​donor_age, ²​donor_uuid, ³​library_uuid,
338+
#> # ⁴​mapped_reference_annotation, ⁵​sample_uuid, ⁶​suspension_type,
339+
#> # ⁷​suspension_uuid, ⁸​author_cell_type
340+
#>
341+
#> [[2]]
342+
#> # Source: SQL [?? x 12]
343+
#> # Database: DuckDB 0.6.2-dev1166 [unknown@Linux 3.10.0-1160.81.1.el7.x86_64:R 4.2.1/:memory:]
344+
#> cell_ file_id orig.…¹ nCoun…² nFeat…³ seura…⁴ Project donor…⁵ compa…⁶ broad…⁷
345+
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
346+
#> 1 1069 8fee7b… 4602ST… 16082 3997 25 Experi… Wilms3 non_PT Pelvic…
347+
#> 2 1214 8fee7b… 4602ST… 1037 606 25 Experi… Wilms3 non_PT Pelvic…
348+
#> 3 2583 8fee7b… 4602ST… 3028 1361 25 Experi… Wilms3 non_PT Pelvic…
349+
#> 4 2655 8fee7b… 4602ST… 1605 859 25 Experi… Wilms3 non_PT Pelvic…
350+
#> 5 3609 8fee7b… 4602ST… 1144 682 25 Experi… Wilms3 non_PT Pelvic…
351+
#> 6 3624 8fee7b… 4602ST… 1874 963 25 Experi… Wilms3 non_PT Pelvic…
352+
#> 7 3946 8fee7b… 4602ST… 1296 755 25 Experi… Wilms3 non_PT Pelvic…
353+
#> 8 5163 8fee7b… 4602ST… 11417 3255 25 Experi… Wilms3 non_PT Pelvic…
354+
#> 9 5446 8fee7b… 4602ST… 1769 946 19 Experi… Wilms2 lympho… CD4 T …
355+
#> 10 6275 8fee7b… 4602ST… 3750 1559 25 Experi… Wilms3 non_PT Pelvic…
356+
#> # … with more rows, 2 more variables: author_cell_type <chr>, Sample <chr>, and
357+
#> # abbreviated variable names ¹​orig.ident, ²​nCount_RNA, ³​nFeature_RNA,
358+
#> # ⁴​seurat_clusters, ⁵​donor_id, ⁶​compartment, ⁷​broad_celltype
356359
```
357360

358361
# Cell metadata

0 commit comments

Comments
 (0)