|
| 1 | +HCA Harmonised |
| 2 | +================ |
| 3 | + |
| 4 | +Load the package |
| 5 | + |
| 6 | +``` r |
| 7 | +library(HCAquery) |
| 8 | +library(dplyr) |
| 9 | +library(dbplyr) |
| 10 | +library(SingleCellExperiment) |
| 11 | +library(tidySingleCellExperiment) |
| 12 | +options("restore_SingleCellExperiment_show" = TRUE) |
| 13 | +``` |
| 14 | + |
| 15 | +Load the metadata |
| 16 | + |
| 17 | +``` r |
| 18 | +get_metadata() |
| 19 | +#> # Source: table<metadata> [?? x 56] |
| 20 | +#> # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite] |
| 21 | +#> .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷ devel…⁸ disease disea…⁹ ethni…˟ ethni…˟ file_id is_pr…˟ organ…˟ organ…˟ sampl…˟ sex sex_o…˟ tissue |
| 22 | +#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> |
| 23 | +#> 1 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 24 | +#> 2 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 25 | +#> 3 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 26 | +#> 4 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 27 | +#> 5 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 28 | +#> 6 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 29 | +#> 7 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 30 | +#> 8 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 31 | +#> 9 AAACGG… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 32 | +#> 10 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip… |
| 33 | +#> # … with more rows, 33 more variables: tissue_ontology_term_id <chr>, tissue_harmonised <chr>, age_days <dbl>, dataset_id <chr>, collection_id <chr>, cell_count <int>, |
| 34 | +#> # dataset_deployments <chr>, is_primary_data.y <chr>, is_valid <int>, linked_genesets <int>, mean_genes_per_cell <dbl>, name <chr>, published <int>, revision <int>, |
| 35 | +#> # schema_version <chr>, tombstone <int>, x_normalization <chr>, created_at.x <dbl>, published_at <dbl>, revised_at <dbl>, updated_at.x <dbl>, filename <chr>, filetype <chr>, |
| 36 | +#> # s3_uri <chr>, user_submitted <int>, created_at.y <dbl>, updated_at.y <dbl>, cell_type_harmonised <chr>, confidence_class <dbl>, cell_annotation_azimuth_l2 <chr>, |
| 37 | +#> # cell_annotation_blueprint_singler <chr>, n_cell_type_in_tissue <int>, n_tissue_in_cell_type <int>, and abbreviated variable names ¹sample_id_db, ².sample_name, |
| 38 | +#> # ³assay_ontology_term_id, ⁴file_id_db, ⁵cell_type, ⁶cell_type_ontology_term_id, ⁷development_stage, ⁸development_stage_ontology_term_id, ⁹disease_ontology_term_id, ˟ethnicity, |
| 39 | +#> # ˟ethnicity_ontology_term_id, ˟is_primary_data.x, ˟organism, ˟organism_ontology_term_id, ˟sample_placeholder, ˟sex_ontology_term_id |
| 40 | +``` |
| 41 | + |
| 42 | +Explore the HCA content |
| 43 | + |
| 44 | +``` r |
| 45 | +get_metadata() |> |
| 46 | + distinct(tissue, file_id) |> |
| 47 | + count(tissue) |> |
| 48 | + arrange(desc(n)) |
| 49 | +#> # Source: SQL [?? x 2] |
| 50 | +#> # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite] |
| 51 | +#> # Ordered by: desc(n) |
| 52 | +#> tissue n |
| 53 | +#> <chr> <int> |
| 54 | +#> 1 blood 47 |
| 55 | +#> 2 heart left ventricle 46 |
| 56 | +#> 3 cortex of kidney 31 |
| 57 | +#> 4 renal medulla 29 |
| 58 | +#> 5 lung 27 |
| 59 | +#> 6 middle temporal gyrus 24 |
| 60 | +#> 7 liver 24 |
| 61 | +#> 8 kidney 19 |
| 62 | +#> 9 intestine 18 |
| 63 | +#> 10 thymus 17 |
| 64 | +#> # … with more rows |
| 65 | +``` |
| 66 | + |
| 67 | +Query raw counts |
| 68 | + |
| 69 | +``` r |
| 70 | +sce <- |
| 71 | + get_metadata() |> |
| 72 | + filter( |
| 73 | + ethnicity == "African" & |
| 74 | + assay %LIKE% "%10x%" & |
| 75 | + tissue == "lung parenchyma" & |
| 76 | + cell_type %LIKE% "%CD4%" |
| 77 | + ) |> |
| 78 | + get_SingleCellExperiment() |
| 79 | +#> ℹ Realising metadata. |
| 80 | +#> ℹ Synchronising files |
| 81 | +#> ℹ Attaching metadata. |
| 82 | +#> ℹ Compiling Single Cell Experiment. |
| 83 | + |
| 84 | +sce |
| 85 | +#> class: SingleCellExperiment |
| 86 | +#> dim: 60661 1571 |
| 87 | +#> metadata(0): |
| 88 | +#> assays(2): counts cpm |
| 89 | +#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P |
| 90 | +#> rowData names(0): |
| 91 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526 |
| 92 | +#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type |
| 93 | +#> reducedDimNames(0): |
| 94 | +#> mainExpName: NULL |
| 95 | +#> altExpNames(0): |
| 96 | +``` |
| 97 | + |
| 98 | +Query counts scaled per million. This is helpful if just few genes are |
| 99 | +of interest |
| 100 | + |
| 101 | +``` r |
| 102 | +sce <- |
| 103 | + get_metadata() |> |
| 104 | + filter( |
| 105 | + ethnicity == "African" & |
| 106 | + assay %LIKE% "%10x%" & |
| 107 | + tissue == "lung parenchyma" & |
| 108 | + cell_type %LIKE% "%CD4%" |
| 109 | + ) |> |
| 110 | + get_SingleCellExperiment(assays = "cpm") |
| 111 | +#> ℹ Realising metadata. |
| 112 | +#> ℹ Synchronising files |
| 113 | +#> ℹ Attaching metadata. |
| 114 | +#> ℹ Compiling Single Cell Experiment. |
| 115 | + |
| 116 | +sce |
| 117 | +#> class: SingleCellExperiment |
| 118 | +#> dim: 60661 1571 |
| 119 | +#> metadata(0): |
| 120 | +#> assays(1): cpm |
| 121 | +#> rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P |
| 122 | +#> rowData names(0): |
| 123 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526 |
| 124 | +#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type |
| 125 | +#> reducedDimNames(0): |
| 126 | +#> mainExpName: NULL |
| 127 | +#> altExpNames(0): |
| 128 | +``` |
| 129 | + |
| 130 | +Extract only a subset of genes: |
| 131 | + |
| 132 | +``` r |
| 133 | +get_metadata() |> |
| 134 | + filter( |
| 135 | + ethnicity == "African" & |
| 136 | + assay %LIKE% "%10x%" & |
| 137 | + tissue == "lung parenchyma" & |
| 138 | + cell_type %LIKE% "%CD4%" |
| 139 | + ) |> |
| 140 | + get_SingleCellExperiment(features = "PUM1") |
| 141 | +#> ℹ Realising metadata. |
| 142 | +#> ℹ Synchronising files |
| 143 | +#> ℹ Attaching metadata. |
| 144 | +#> ℹ Compiling Single Cell Experiment. |
| 145 | +#> class: SingleCellExperiment |
| 146 | +#> dim: 1 1571 |
| 147 | +#> metadata(0): |
| 148 | +#> assays(2): counts cpm |
| 149 | +#> rownames(1): PUM1 |
| 150 | +#> rowData names(0): |
| 151 | +#> colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526 |
| 152 | +#> colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type |
| 153 | +#> reducedDimNames(0): |
| 154 | +#> mainExpName: NULL |
| 155 | +#> altExpNames(0): |
| 156 | +``` |
| 157 | + |
| 158 | +Extract the counts as a Seurat object: |
| 159 | + |
| 160 | +``` r |
| 161 | +get_metadata() |> |
| 162 | + filter( |
| 163 | + ethnicity == "African" & |
| 164 | + assay %LIKE% "%10x%" & |
| 165 | + tissue == "lung parenchyma" & |
| 166 | + cell_type %LIKE% "%CD4%" |
| 167 | + ) |> |
| 168 | + get_seurat() |
| 169 | +#> ℹ Realising metadata. |
| 170 | +#> ℹ Synchronising files |
| 171 | +#> ℹ Attaching metadata. |
| 172 | +#> ℹ Compiling Single Cell Experiment. |
| 173 | +#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-') |
| 174 | +#> An object of class Seurat |
| 175 | +#> 60661 features across 1571 samples within 1 assay |
| 176 | +#> Active assay: originalexp (60661 features, 0 variable features) |
| 177 | +``` |
| 178 | + |
| 179 | +``` r |
| 180 | +sessionInfo() |
| 181 | +#> R version 4.2.1 (2022-06-23) |
| 182 | +#> Platform: x86_64-pc-linux-gnu (64-bit) |
| 183 | +#> Running under: CentOS Linux 7 (Core) |
| 184 | +#> |
| 185 | +#> Matrix products: default |
| 186 | +#> BLAS: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so |
| 187 | +#> LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so |
| 188 | +#> |
| 189 | +#> locale: |
| 190 | +#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 |
| 191 | +#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C |
| 192 | +#> |
| 193 | +#> attached base packages: |
| 194 | +#> [1] stats4 stats graphics grDevices utils datasets methods base |
| 195 | +#> |
| 196 | +#> other attached packages: |
| 197 | +#> [1] HCAquery_0.1.0 testthat_3.1.6 tidySingleCellExperiment_1.6.3 ttservice_0.2.2 SingleCellExperiment_1.18.1 |
| 198 | +#> [6] SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4 IRanges_2.30.1 |
| 199 | +#> [11] S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.1 matrixStats_0.63.0 dbplyr_2.2.1 |
| 200 | +#> [16] dplyr_1.0.10 |
| 201 | +#> |
| 202 | +#> loaded via a namespace (and not attached): |
| 203 | +#> [1] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2 sp_1.5-1 splines_4.2.1 listenv_0.9.0 scattermore_0.8 |
| 204 | +#> [8] usethis_2.1.6 ggplot2_3.4.0 digest_0.6.31 htmltools_0.5.4 fansi_1.0.3 magrittr_2.0.3 memoise_2.0.1 |
| 205 | +#> [15] tensor_1.5 cluster_2.1.3 ROCR_1.0-11 remotes_2.4.2 globals_0.16.2 spatstat.sparse_3.0-0 prettyunits_1.1.1 |
| 206 | +#> [22] colorspace_2.0-3 blob_1.2.3 rappdirs_0.3.3 ggrepel_0.9.2 xfun_0.36 crayon_1.5.2 callr_3.7.3 |
| 207 | +#> [29] RCurl_1.98-1.9 jsonlite_1.8.4 progressr_0.12.0 spatstat.data_3.0-0 survival_3.3-1 zoo_1.8-11 glue_1.6.2 |
| 208 | +#> [36] polyclip_1.10-4 gtable_0.3.1 zlibbioc_1.42.0 XVector_0.36.0 leiden_0.4.3 DelayedArray_0.22.0 pkgbuild_1.4.0 |
| 209 | +#> [43] Rhdf5lib_1.18.2 future.apply_1.10.0 HDF5Array_1.24.2 abind_1.4-5 scales_1.2.1 DBI_1.1.3 spatstat.random_3.0-1 |
| 210 | +#> [50] miniUI_0.1.1.1 Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4 reticulate_1.26 bit_4.0.5 profvis_0.3.7 |
| 211 | +#> [57] htmlwidgets_1.6.0 httr_1.4.4 RColorBrewer_1.1-3 ellipsis_0.3.2 Seurat_4.3.0 ica_1.0-3 urlchecker_1.0.1 |
| 212 | +#> [64] pkgconfig_2.0.3 uwot_0.1.14 deldir_1.0-6 utf8_1.2.2 tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4 |
| 213 | +#> [71] later_1.3.0 munsell_0.5.0 tools_4.2.1 cachem_1.0.6 cli_3.5.0 generics_0.1.3 RSQLite_2.2.20 |
| 214 | +#> [78] devtools_2.4.5 ggridges_0.5.4 evaluate_0.19 stringr_1.5.0 fastmap_1.1.0 yaml_2.3.6 goftest_1.2-3 |
| 215 | +#> [85] processx_3.8.0 fs_1.5.2 knitr_1.41 bit64_4.0.5 fitdistrplus_1.1-8 purrr_1.0.0 RANN_2.6.1 |
| 216 | +#> [92] pbapply_1.6-0 future_1.30.0 nlme_3.1-157 mime_0.12 brio_1.1.3 compiler_4.2.1 rstudioapi_0.14 |
| 217 | +#> [99] plotly_4.10.1 png_0.1-8 spatstat.utils_3.0-1 tibble_3.1.8 stringi_1.7.8 ps_1.7.2 desc_1.4.2 |
| 218 | +#> [106] lattice_0.20-45 Matrix_1.5-3 vctrs_0.5.1 pillar_1.8.1 lifecycle_1.0.3 rhdf5filters_1.8.0 spatstat.geom_3.0-3 |
| 219 | +#> [113] lmtest_0.9-40 RcppAnnoy_0.0.20 data.table_1.14.6 cowplot_1.1.1 bitops_1.0-7 irlba_2.3.5.1 httpuv_1.6.7 |
| 220 | +#> [120] patchwork_1.1.2 R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3 parallelly_1.33.0 sessioninfo_1.2.2 |
| 221 | +#> [127] codetools_0.2-18 pkgload_1.3.2 MASS_7.3-57 assertthat_0.2.1 rhdf5_2.40.0 rprojroot_2.0.3 withr_2.5.0 |
| 222 | +#> [134] SeuratObject_4.1.3 sctransform_0.3.5 GenomeInfoDbData_1.2.8 parallel_4.2.1 grid_4.2.1 tidyr_1.2.1 rmarkdown_2.19 |
| 223 | +#> [141] Rtsne_0.16 spatstat.explore_3.0-5 shiny_1.7.4 |
| 224 | +``` |
0 commit comments