Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Commit 649300b

Browse files
committed
Add github readme generated from vignette
1 parent bcbb7b4 commit 649300b

File tree

1 file changed

+224
-0
lines changed

1 file changed

+224
-0
lines changed

readme.md

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
HCA Harmonised
2+
================
3+
4+
Load the package
5+
6+
``` r
7+
library(HCAquery)
8+
library(dplyr)
9+
library(dbplyr)
10+
library(SingleCellExperiment)
11+
library(tidySingleCellExperiment)
12+
options("restore_SingleCellExperiment_show" = TRUE)
13+
```
14+
15+
Load the metadata
16+
17+
``` r
18+
get_metadata()
19+
#> # Source: table<metadata> [?? x 56]
20+
#> # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite]
21+
#> .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷ devel…⁸ disease disea…⁹ ethni…˟ ethni…˟ file_id is_pr…˟ organ…˟ organ…˟ sampl…˟ sex sex_o…˟ tissue
22+
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
23+
#> 1 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
24+
#> 2 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
25+
#> 3 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
26+
#> 4 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
27+
#> 5 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
28+
#> 6 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
29+
#> 7 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
30+
#> 8 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
31+
#> 9 AAACGG… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
32+
#> 10 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
33+
#> # … with more rows, 33 more variables: tissue_ontology_term_id <chr>, tissue_harmonised <chr>, age_days <dbl>, dataset_id <chr>, collection_id <chr>, cell_count <int>,
34+
#> # dataset_deployments <chr>, is_primary_data.y <chr>, is_valid <int>, linked_genesets <int>, mean_genes_per_cell <dbl>, name <chr>, published <int>, revision <int>,
35+
#> # schema_version <chr>, tombstone <int>, x_normalization <chr>, created_at.x <dbl>, published_at <dbl>, revised_at <dbl>, updated_at.x <dbl>, filename <chr>, filetype <chr>,
36+
#> # s3_uri <chr>, user_submitted <int>, created_at.y <dbl>, updated_at.y <dbl>, cell_type_harmonised <chr>, confidence_class <dbl>, cell_annotation_azimuth_l2 <chr>,
37+
#> # cell_annotation_blueprint_singler <chr>, n_cell_type_in_tissue <int>, n_tissue_in_cell_type <int>, and abbreviated variable names ¹​sample_id_db, ²​.sample_name,
38+
#> # ³​assay_ontology_term_id, ⁴​file_id_db, ⁵​cell_type, ⁶​cell_type_ontology_term_id, ⁷​development_stage, ⁸​development_stage_ontology_term_id, ⁹​disease_ontology_term_id, ˟​ethnicity,
39+
#> # ˟​ethnicity_ontology_term_id, ˟​is_primary_data.x, ˟​organism, ˟​organism_ontology_term_id, ˟​sample_placeholder, ˟​sex_ontology_term_id
40+
```
41+
42+
Explore the HCA content
43+
44+
``` r
45+
get_metadata() |>
46+
distinct(tissue, file_id) |>
47+
count(tissue) |>
48+
arrange(desc(n))
49+
#> # Source: SQL [?? x 2]
50+
#> # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite]
51+
#> # Ordered by: desc(n)
52+
#> tissue n
53+
#> <chr> <int>
54+
#> 1 blood 47
55+
#> 2 heart left ventricle 46
56+
#> 3 cortex of kidney 31
57+
#> 4 renal medulla 29
58+
#> 5 lung 27
59+
#> 6 middle temporal gyrus 24
60+
#> 7 liver 24
61+
#> 8 kidney 19
62+
#> 9 intestine 18
63+
#> 10 thymus 17
64+
#> # … with more rows
65+
```
66+
67+
Query raw counts
68+
69+
``` r
70+
sce <-
71+
get_metadata() |>
72+
filter(
73+
ethnicity == "African" &
74+
assay %LIKE% "%10x%" &
75+
tissue == "lung parenchyma" &
76+
cell_type %LIKE% "%CD4%"
77+
) |>
78+
get_SingleCellExperiment()
79+
#> ℹ Realising metadata.
80+
#> ℹ Synchronising files
81+
#> ℹ Attaching metadata.
82+
#> ℹ Compiling Single Cell Experiment.
83+
84+
sce
85+
#> class: SingleCellExperiment
86+
#> dim: 60661 1571
87+
#> metadata(0):
88+
#> assays(2): counts cpm
89+
#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
90+
#> rowData names(0):
91+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
92+
#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
93+
#> reducedDimNames(0):
94+
#> mainExpName: NULL
95+
#> altExpNames(0):
96+
```
97+
98+
Query counts scaled per million. This is helpful if just few genes are
99+
of interest
100+
101+
``` r
102+
sce <-
103+
get_metadata() |>
104+
filter(
105+
ethnicity == "African" &
106+
assay %LIKE% "%10x%" &
107+
tissue == "lung parenchyma" &
108+
cell_type %LIKE% "%CD4%"
109+
) |>
110+
get_SingleCellExperiment(assays = "cpm")
111+
#> ℹ Realising metadata.
112+
#> ℹ Synchronising files
113+
#> ℹ Attaching metadata.
114+
#> ℹ Compiling Single Cell Experiment.
115+
116+
sce
117+
#> class: SingleCellExperiment
118+
#> dim: 60661 1571
119+
#> metadata(0):
120+
#> assays(1): cpm
121+
#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
122+
#> rowData names(0):
123+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
124+
#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
125+
#> reducedDimNames(0):
126+
#> mainExpName: NULL
127+
#> altExpNames(0):
128+
```
129+
130+
Extract only a subset of genes:
131+
132+
``` r
133+
get_metadata() |>
134+
filter(
135+
ethnicity == "African" &
136+
assay %LIKE% "%10x%" &
137+
tissue == "lung parenchyma" &
138+
cell_type %LIKE% "%CD4%"
139+
) |>
140+
get_SingleCellExperiment(features = "PUM1")
141+
#> ℹ Realising metadata.
142+
#> ℹ Synchronising files
143+
#> ℹ Attaching metadata.
144+
#> ℹ Compiling Single Cell Experiment.
145+
#> class: SingleCellExperiment
146+
#> dim: 1 1571
147+
#> metadata(0):
148+
#> assays(2): counts cpm
149+
#> rownames(1): PUM1
150+
#> rowData names(0):
151+
#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
152+
#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
153+
#> reducedDimNames(0):
154+
#> mainExpName: NULL
155+
#> altExpNames(0):
156+
```
157+
158+
Extract the counts as a Seurat object:
159+
160+
``` r
161+
get_metadata() |>
162+
filter(
163+
ethnicity == "African" &
164+
assay %LIKE% "%10x%" &
165+
tissue == "lung parenchyma" &
166+
cell_type %LIKE% "%CD4%"
167+
) |>
168+
get_seurat()
169+
#> ℹ Realising metadata.
170+
#> ℹ Synchronising files
171+
#> ℹ Attaching metadata.
172+
#> ℹ Compiling Single Cell Experiment.
173+
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
174+
#> An object of class Seurat
175+
#> 60661 features across 1571 samples within 1 assay
176+
#> Active assay: originalexp (60661 features, 0 variable features)
177+
```
178+
179+
``` r
180+
sessionInfo()
181+
#> R version 4.2.1 (2022-06-23)
182+
#> Platform: x86_64-pc-linux-gnu (64-bit)
183+
#> Running under: CentOS Linux 7 (Core)
184+
#>
185+
#> Matrix products: default
186+
#> BLAS: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so
187+
#> LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so
188+
#>
189+
#> locale:
190+
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
191+
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
192+
#>
193+
#> attached base packages:
194+
#> [1] stats4 stats graphics grDevices utils datasets methods base
195+
#>
196+
#> other attached packages:
197+
#> [1] HCAquery_0.1.0 testthat_3.1.6 tidySingleCellExperiment_1.6.3 ttservice_0.2.2 SingleCellExperiment_1.18.1
198+
#> [6] SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4 IRanges_2.30.1
199+
#> [11] S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.1 matrixStats_0.63.0 dbplyr_2.2.1
200+
#> [16] dplyr_1.0.10
201+
#>
202+
#> loaded via a namespace (and not attached):
203+
#> [1] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2 sp_1.5-1 splines_4.2.1 listenv_0.9.0 scattermore_0.8
204+
#> [8] usethis_2.1.6 ggplot2_3.4.0 digest_0.6.31 htmltools_0.5.4 fansi_1.0.3 magrittr_2.0.3 memoise_2.0.1
205+
#> [15] tensor_1.5 cluster_2.1.3 ROCR_1.0-11 remotes_2.4.2 globals_0.16.2 spatstat.sparse_3.0-0 prettyunits_1.1.1
206+
#> [22] colorspace_2.0-3 blob_1.2.3 rappdirs_0.3.3 ggrepel_0.9.2 xfun_0.36 crayon_1.5.2 callr_3.7.3
207+
#> [29] RCurl_1.98-1.9 jsonlite_1.8.4 progressr_0.12.0 spatstat.data_3.0-0 survival_3.3-1 zoo_1.8-11 glue_1.6.2
208+
#> [36] polyclip_1.10-4 gtable_0.3.1 zlibbioc_1.42.0 XVector_0.36.0 leiden_0.4.3 DelayedArray_0.22.0 pkgbuild_1.4.0
209+
#> [43] Rhdf5lib_1.18.2 future.apply_1.10.0 HDF5Array_1.24.2 abind_1.4-5 scales_1.2.1 DBI_1.1.3 spatstat.random_3.0-1
210+
#> [50] miniUI_0.1.1.1 Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4 reticulate_1.26 bit_4.0.5 profvis_0.3.7
211+
#> [57] htmlwidgets_1.6.0 httr_1.4.4 RColorBrewer_1.1-3 ellipsis_0.3.2 Seurat_4.3.0 ica_1.0-3 urlchecker_1.0.1
212+
#> [64] pkgconfig_2.0.3 uwot_0.1.14 deldir_1.0-6 utf8_1.2.2 tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4
213+
#> [71] later_1.3.0 munsell_0.5.0 tools_4.2.1 cachem_1.0.6 cli_3.5.0 generics_0.1.3 RSQLite_2.2.20
214+
#> [78] devtools_2.4.5 ggridges_0.5.4 evaluate_0.19 stringr_1.5.0 fastmap_1.1.0 yaml_2.3.6 goftest_1.2-3
215+
#> [85] processx_3.8.0 fs_1.5.2 knitr_1.41 bit64_4.0.5 fitdistrplus_1.1-8 purrr_1.0.0 RANN_2.6.1
216+
#> [92] pbapply_1.6-0 future_1.30.0 nlme_3.1-157 mime_0.12 brio_1.1.3 compiler_4.2.1 rstudioapi_0.14
217+
#> [99] plotly_4.10.1 png_0.1-8 spatstat.utils_3.0-1 tibble_3.1.8 stringi_1.7.8 ps_1.7.2 desc_1.4.2
218+
#> [106] lattice_0.20-45 Matrix_1.5-3 vctrs_0.5.1 pillar_1.8.1 lifecycle_1.0.3 rhdf5filters_1.8.0 spatstat.geom_3.0-3
219+
#> [113] lmtest_0.9-40 RcppAnnoy_0.0.20 data.table_1.14.6 cowplot_1.1.1 bitops_1.0-7 irlba_2.3.5.1 httpuv_1.6.7
220+
#> [120] patchwork_1.1.2 R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3 parallelly_1.33.0 sessioninfo_1.2.2
221+
#> [127] codetools_0.2-18 pkgload_1.3.2 MASS_7.3-57 assertthat_0.2.1 rhdf5_2.40.0 rprojroot_2.0.3 withr_2.5.0
222+
#> [134] SeuratObject_4.1.3 sctransform_0.3.5 GenomeInfoDbData_1.2.8 parallel_4.2.1 grid_4.2.1 tidyr_1.2.1 rmarkdown_2.19
223+
#> [141] Rtsne_0.16 spatstat.explore_3.0-5 shiny_1.7.4
224+
```

0 commit comments

Comments
 (0)