You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 14, 2025. It is now read-only.
#' Gets the Curated Atlas metadata as a data frame.
356
-
#'
357
-
#' Downloads a parquet database of the Human Cell Atlas metadata to a local
358
-
#' cache, and then opens it as a data frame. It can then be filtered and
359
-
#' passed into [get_SingleCellExperiment()]
360
-
#' to obtain a [`SingleCellExperiment::SingleCellExperiment-class`]
361
-
#'
362
-
#' @param remote_url Optional character vector of length 1. An HTTP URL pointing
363
-
#' to the location of the parquet database.
364
-
#' @param cache_directory Optional character vector of length 1. A file path on
365
-
#' your local system to a directory (not a file) that will be used to store
366
-
#' metadata.parquet
367
-
#' @return A lazy data.frame subclass containing the metadata. You can interact
368
-
#' with this object using most standard dplyr functions. For string matching,
369
-
#' it is recommended that you use `stringr::str_like` to filter character
370
-
#' columns, as `stringr::str_match` will not work.
371
-
#' @export
372
-
#' @examples
373
-
#' library(dplyr)
374
-
#' filtered_metadata <- get_metadata() |>
375
-
#' filter(
376
-
#' ethnicity == "African" &
377
-
#' assay %LIKE% "%10x%" &
378
-
#' tissue == "lung parenchyma" &
379
-
#' cell_type %LIKE% "%CD4%"
380
-
#' )
381
-
#'
382
-
#' @importFrom DBI dbConnect
383
-
#' @importFrom duckdb duckdb
384
-
#' @importFrom dplyr tbl
385
-
#' @importFrom httr progress
386
-
#' @importFrom cli cli_alert_info
387
-
#'
388
-
#' @details
389
-
#'
390
-
#' The metadata was collected from the Bioconductor package `cellxgenedp`. it's vignette `using_cellxgenedp` provides an overview of the columns in the metadata.
391
-
#' The data for which the column `organism_name` included "Homo sapiens" was collected collected from `cellxgenedp`.
392
-
#'
393
-
#' The columns `dataset_id` and `file_id` link the datasets explorable through `CuratedAtlasQueryR` and `cellxgenedp`to the CELLxGENE portal.
394
-
#'
395
-
#' Our representation, harmonises the metadata at dataset, sample and cell levels, in a unique coherent database table.
396
-
#'
397
-
#' Dataset-specific columns (definitions available at cellxgene.cziscience.com)
#' Through harmonisation and curation we introduced custom column, not present in the original CELLxGENE metadata
409
-
#'
410
-
#' - `tissue_harmonised`: a coarser tissue name for better filtering
411
-
#' - `age_days`: the number of days corresponding to the age
412
-
#' - `cell_type_harmonised`: the consensus call identity (for immune cells) using the original and three novel annotations using Seurat Azimuth and SingleR
413
-
#' - `confidence_class`: an ordinal class of how confident `cell_type_harmonised` is. 1 is complete consensus, 2 is 3 out of four and so on.
#' - `cell_annotation_blueprint_singler`: SingleR cell annotation using Blueprint reference
416
-
#' - `cell_annotation_blueprint_monaco`: SingleR cell annotation using Monaco reference
417
-
#' - `sample_id_db`: Sample subdivision for internal use
418
-
#' - `file_id_db`: File subdivision for internal use
419
-
#' - `sample_`: Sample ID
420
-
#' - `.sample_name`: How samples were defined
421
-
#'
422
-
#'
423
-
#' **Possible cache path issues**
424
-
#'
425
-
#' If your default R cache path includes non-standard characters (e.g. dash because of your user or organisation name), the following error can manifest
426
-
#'
427
-
#' Error in `db_query_fields.DBIConnection()`:
428
-
#' ! Can't query fields.
429
-
#' Caused by error:
430
-
#' ! Parser Error: syntax error at or near "/"
431
-
#' LINE 2: FROM /Users/bob/Library/Cach...
432
-
#'
433
-
#' The solution is to choose a different cache, for example
0 commit comments