1- HCAquery
1+ readme
22================
33
44Load the package
@@ -16,182 +16,162 @@ Load the metadata
1616
1717``` r
1818get_metadata()
19+ # > # Source: table<metadata> [?? x 56]
20+ # > # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite]
21+ # > .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷ devel…⁸ disease disea…⁹ ethni…˟ ethni…˟ file_id is_pr…˟ organ…˟ organ…˟ sampl…˟ sex sex_o…˟ tissue
22+ # > <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
23+ # > 1 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
24+ # > 2 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
25+ # > 3 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
26+ # > 4 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
27+ # > 5 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
28+ # > 6 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
29+ # > 7 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
30+ # > 8 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
31+ # > 9 AAACGG… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
32+ # > 10 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea… HsapDv… normal PATO:0… Europe… HANCES… 00d626… FALSE Homo s… NCBITa… <NA> male PATO:0… perip…
33+ # > # … with more rows, 33 more variables: tissue_ontology_term_id <chr>, tissue_harmonised <chr>, age_days <dbl>, dataset_id <chr>, collection_id <chr>, cell_count <int>,
34+ # > # dataset_deployments <chr>, is_primary_data.y <chr>, is_valid <int>, linked_genesets <int>, mean_genes_per_cell <dbl>, name <chr>, published <int>, revision <int>,
35+ # > # schema_version <chr>, tombstone <int>, x_normalization <chr>, created_at.x <dbl>, published_at <dbl>, revised_at <dbl>, updated_at.x <dbl>, filename <chr>, filetype <chr>,
36+ # > # s3_uri <chr>, user_submitted <int>, created_at.y <dbl>, updated_at.y <dbl>, cell_type_harmonised <chr>, confidence_class <dbl>, cell_annotation_azimuth_l2 <chr>,
37+ # > # cell_annotation_blueprint_singler <chr>, n_cell_type_in_tissue <int>, n_tissue_in_cell_type <int>, and abbreviated variable names ¹sample_id_db, ².sample_name,
38+ # > # ³assay_ontology_term_id, ⁴file_id_db, ⁵cell_type, ⁶cell_type_ontology_term_id, ⁷development_stage, ⁸development_stage_ontology_term_id, ⁹disease_ontology_term_id, ˟ethnicity,
39+ # > # ˟ethnicity_ontology_term_id, ˟is_primary_data.x, ˟organism, ˟organism_ontology_term_id, ˟sample_placeholder, ˟sex_ontology_term_id
1940```
2041
21- ## # Source: table<metadata> [?? x 56]
22- ## # Database: sqlite 3.39.3 [/vast/projects/RCP/human_cell_atlas/metadata.sqlite]
23- ## .cell sampl…¹ .sample .samp…² assay assay…³ file_…⁴ cell_…⁵ cell_…⁶ devel…⁷
24- ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
25- ## 1 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
26- ## 2 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
27- ## 3 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea…
28- ## 4 AAACCT… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea…
29- ## 5 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
30- ## 6 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
31- ## 7 AAACCT… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
32- ## 8 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
33- ## 9 AAACGG… 02eb2e… 5f20d7… D17PrP… 10x … EFO:00… 30f754… lumina… CL:000… 31-yea…
34- ## 10 AAACGG… 8a0fe0… 5f20d7… D17PrP… 10x … EFO:00… 1e334b… basal … CL:000… 31-yea…
35- ## # … with more rows, 46 more variables:
36- ## # development_stage_ontology_term_id <chr>, disease <chr>,
37- ## # disease_ontology_term_id <chr>, ethnicity <chr>,
38- ## # ethnicity_ontology_term_id <chr>, file_id <chr>, is_primary_data.x <chr>,
39- ## # organism <chr>, organism_ontology_term_id <chr>, sample_placeholder <chr>,
40- ## # sex <chr>, sex_ontology_term_id <chr>, tissue <chr>,
41- ## # tissue_ontology_term_id <chr>, tissue_harmonised <chr>, age_days <dbl>, …
42-
4342Explore the HCA content
4443
4544``` r
46- get_metadata() | >
47- distinct(tissue , file_id ) | >
48- count(tissue ) | >
49- arrange(desc(n ))
45+ get_metadata() | >
46+ distinct(tissue , file_id ) | >
47+ count(tissue ) | >
48+ arrange(desc(n ))
49+ # > # Source: SQL [?? x 2]
50+ # > # Database: sqlite 3.40.0 [/vast/scratch/users/milton.m/cache/hca_harmonised/metadata.sqlite]
51+ # > # Ordered by: desc(n)
52+ # > tissue n
53+ # > <chr> <int>
54+ # > 1 blood 47
55+ # > 2 heart left ventricle 46
56+ # > 3 cortex of kidney 31
57+ # > 4 renal medulla 29
58+ # > 5 lung 27
59+ # > 6 middle temporal gyrus 24
60+ # > 7 liver 24
61+ # > 8 kidney 19
62+ # > 9 intestine 18
63+ # > 10 thymus 17
64+ # > # … with more rows
5065```
5166
52- ## # Source: SQL [?? x 2]
53- ## # Database: sqlite 3.39.3 [/vast/projects/RCP/human_cell_atlas/metadata.sqlite]
54- ## # Ordered by: desc(n)
55- ## tissue n
56- ## <chr> <int>
57- ## 1 blood 47
58- ## 2 heart left ventricle 46
59- ## 3 cortex of kidney 31
60- ## 4 renal medulla 29
61- ## 5 lung 27
62- ## 6 middle temporal gyrus 24
63- ## 7 liver 24
64- ## 8 kidney 19
65- ## 9 intestine 18
66- ## 10 thymus 17
67- ## # … with more rows
68-
6967Query raw counts
7068
7169``` r
72- sce =
73- get_metadata() | >
74- filter(
75- ethnicity == " African" &
76- assay %LIKE % " %10x%" &
77- tissue == " lung parenchyma" &
78- cell_type %LIKE % " %CD4%"
79- ) | >
80-
81- get_SingleCellExperiment()
82- ```
83-
84- ## Reading 1 files.
85-
86- ## .
70+ sce <-
71+ get_metadata() | >
72+ filter(
73+ ethnicity == " African" &
74+ assay %LIKE % " %10x%" &
75+ tissue == " lung parenchyma" &
76+ cell_type %LIKE % " %CD4%"
77+ ) | >
78+ get_SingleCellExperiment()
79+ # > ℹ Realising metadata.
80+ # > ℹ Synchronising files
81+ # > ℹ Attaching metadata.
82+ # > ℹ Compiling Single Cell Experiment.
8783
88- ``` r
8984sce
85+ # > class: SingleCellExperiment
86+ # > dim: 60661 1571
87+ # > metadata(0):
88+ # > assays(2): counts cpm
89+ # > rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
90+ # > rowData names(0):
91+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
92+ # > colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
93+ # > reducedDimNames(0):
94+ # > mainExpName: NULL
95+ # > altExpNames(0):
9096```
9197
92- ## class: SingleCellExperiment
93- ## dim: 60661 1571
94- ## metadata(0):
95- ## assays(1): counts
96- ## rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
97- ## rowData names(0):
98- ## colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ...
99- ## TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
100- ## colData names(55): sample_id_db .sample ... n_cell_type_in_tissue
101- ## n_tissue_in_cell_type
102- ## reducedDimNames(0):
103- ## mainExpName: NULL
104- ## altExpNames(0):
105-
10698Query counts scaled per million. This is helpful if just few genes are
10799of interest
108100
109101``` r
110- sce =
111- get_metadata() | >
112- filter(
113- ethnicity == " African" &
114- assay %LIKE % " %10x%" &
115- tissue == " lung parenchyma" &
116- cell_type %LIKE % " %CD4%"
117- ) | >
118-
119- get_SingleCellExperiment(assay = " counts_per_million" )
120- ```
121-
122- ## Reading 1 files.
123-
124- ## .
102+ sce <-
103+ get_metadata() | >
104+ filter(
105+ ethnicity == " African" &
106+ assay %LIKE % " %10x%" &
107+ tissue == " lung parenchyma" &
108+ cell_type %LIKE % " %CD4%"
109+ ) | >
110+ get_SingleCellExperiment(assays = " cpm" )
111+ # > ℹ Realising metadata.
112+ # > ℹ Synchronising files
113+ # > ℹ Attaching metadata.
114+ # > ℹ Compiling Single Cell Experiment.
125115
126- ``` r
127116sce
117+ # > class: SingleCellExperiment
118+ # > dim: 60661 1571
119+ # > metadata(0):
120+ # > assays(1): cpm
121+ # > rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
122+ # > rowData names(0):
123+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
124+ # > colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
125+ # > reducedDimNames(0):
126+ # > mainExpName: NULL
127+ # > altExpNames(0):
128128```
129129
130- ## class: SingleCellExperiment
131- ## dim: 60661 1571
132- ## metadata(0):
133- ## assays(1): counts_per_million
134- ## rownames(60661): TSPAN6 TNMD ... RP11-175I6.6 PRSS43P
135- ## rowData names(0):
136- ## colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ...
137- ## TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
138- ## colData names(55): sample_id_db .sample ... n_cell_type_in_tissue
139- ## n_tissue_in_cell_type
140- ## reducedDimNames(0):
141- ## mainExpName: NULL
142- ## altExpNames(0):
143-
144130Extract only a subset of genes:
145131
146132``` r
147- get_metadata() | >
133+ get_metadata() | >
148134 filter(
149- ethnicity == " African" &
150- assay %LIKE % " %10x%" &
151- tissue == " lung parenchyma" &
135+ ethnicity == " African" &
136+ assay %LIKE % " %10x%" &
137+ tissue == " lung parenchyma" &
152138 cell_type %LIKE % " %CD4%"
153- ) | >
154- get_SingleCellExperiment(genes = " PUM1" )
139+ ) | >
140+ get_SingleCellExperiment(features = " PUM1" )
141+ # > ℹ Realising metadata.
142+ # > ℹ Synchronising files
143+ # > ℹ Attaching metadata.
144+ # > ℹ Compiling Single Cell Experiment.
145+ # > class: SingleCellExperiment
146+ # > dim: 1 1571
147+ # > metadata(0):
148+ # > assays(2): counts cpm
149+ # > rownames(1): PUM1
150+ # > rowData names(0):
151+ # > colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ... TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
152+ # > colData names(55): sample_id_db .sample ... n_cell_type_in_tissue n_tissue_in_cell_type
153+ # > reducedDimNames(0):
154+ # > mainExpName: NULL
155+ # > altExpNames(0):
155156```
156157
157- ## Reading 1 files.
158-
159- ## .
160-
161- ## class: SingleCellExperiment
162- ## dim: 1 1571
163- ## metadata(0):
164- ## assays(1): counts
165- ## rownames(1): PUM1
166- ## rowData names(0):
167- ## colnames(1571): ACAGCCGGTCCGTTAA_F02526 GGGAATGAGCCCAGCT_F02526 ...
168- ## TACAACGTCAGCATTG_SC84 CATTCGCTCAATACCG_F02526
169- ## colData names(55): sample_id_db .sample ... n_cell_type_in_tissue
170- ## n_tissue_in_cell_type
171- ## reducedDimNames(0):
172- ## mainExpName: NULL
173- ## altExpNames(0):
174-
175158Extract the counts as a Seurat object:
176159
177160``` r
178- get_metadata() | >
161+ get_metadata() | >
179162 filter(
180- ethnicity == " African" &
181- assay %LIKE % " %10x%" &
182- tissue == " lung parenchyma" &
163+ ethnicity == " African" &
164+ assay %LIKE % " %10x%" &
165+ tissue == " lung parenchyma" &
183166 cell_type %LIKE % " %CD4%"
184- ) | >
185- get_seurat()
167+ ) | >
168+ get_seurat()
169+ # > ℹ Realising metadata.
170+ # > ℹ Synchronising files
171+ # > ℹ Attaching metadata.
172+ # > ℹ Compiling Single Cell Experiment.
173+ # > Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
174+ # > An object of class Seurat
175+ # > 60661 features across 1571 samples within 1 assay
176+ # > Active assay: originalexp (60661 features, 0 variable features)
186177```
187-
188- ## Reading 1 files.
189-
190- ## .
191-
192- ## Warning: Feature names cannot have underscores ('_'), replacing with dashes
193- ## ('-')
194-
195- ## An object of class Seurat
196- ## 60661 features across 1571 samples within 1 assay
197- ## Active assay: originalexp (60661 features, 0 variable features)
0 commit comments