Skip to content

Commit bc7711f

Browse files
committed
Add GET_STARTED.md for initial setup and usage instructions
1 parent 333925f commit bc7711f

File tree

1 file changed

+146
-0
lines changed

1 file changed

+146
-0
lines changed

docs/GET_STARTED.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Get Started
2+
3+
This short walkthrough shows the basic workflow: inspect a store, export metadata, and write a subset.
4+
5+
## 1 Install
6+
7+
Using uv (recommended):
8+
```bash
9+
git clone https://github.com/cellgeni/h5ad-cli.git
10+
cd h5ad-cli
11+
uv sync
12+
```
13+
14+
With pip:
15+
```bash
16+
git clone https://github.com/cellgeni/h5ad-cli.git
17+
cd h5ad-cli
18+
pip install .
19+
```
20+
21+
Additionally, it might be useful to install `csvkit` for inspecting exported CSV files:
22+
```bash
23+
# with uv
24+
uv pip install csvkit
25+
26+
# with pip
27+
pip install csvkit
28+
```
29+
30+
## 2 Inspect a files with `info` command
31+
32+
Let's load an example `.h5ad` file:
33+
```bash
34+
wget -O visium.h5ad https://exampledata.scverse.org/squidpy/figshare/visium_hne_adata.h5ad
35+
```
36+
37+
Now run `info` to see the file structure:
38+
```bash
39+
uv run h5ad info visium.h5ad
40+
```
41+
```
42+
An object with n_obs × n_var: 2688 × 18078
43+
obs: array_col, array_row, cluster, in_tissue, leiden, log1p_n_genes_by_counts, log1p_total_counts, log1p_total_counts_mt, n_counts, n_genes_by_counts, pct_counts_in_top_100_genes, pct_counts_in_top_200_genes, pct_counts_in_top_500_genes,
44+
pct_counts_in_top_50_genes, pct_counts_mt, total_counts, total_counts_mt
45+
var: feature_types, gene_ids, genome, highly_variable, highly_variable_rank, log1p_mean_counts, log1p_total_counts, mean_counts, means, mt, n_cells, n_cells_by_counts, pct_dropout_by_counts, total_counts, variances, variances_norm
46+
obsm: X_pca, X_umap, spatial
47+
varm: PCs
48+
obsp: connectivities, distances
49+
uns: cluster_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, spatial, umap
50+
raw: X, var
51+
```
52+
53+
To inspect a specific entry:
54+
```bash
55+
uv run h5ad info visium.h5ad obsm/X_pca
56+
```
57+
```
58+
Path: obsm/X_pca
59+
Type: dense-matrix
60+
Shape: (2688, 50)
61+
Dtype: float32
62+
Details: Dense matrix 2688×50 (float32)
63+
```
64+
65+
## 3 Export entries
66+
View the first few lines of the `obs` dataframe:
67+
68+
```bash
69+
uv run h5ad export dataframe visium.h5ad obs --head 10
70+
```
71+
```csv
72+
_index,array_col,array_row,cluster,in_tissue,leiden,log1p_n_genes_by_counts,log1p_total_counts,log1p_total_counts_mt,n_counts,n_genes_by_counts,pct_counts_in_top_100_genes,pct_counts_in_top_200_genes,pct_counts_in_top_500_genes,pct_counts_in_top_50_genes,pct_counts_mt,total_counts,total_counts_mt
73+
AAACAAGTATCTCCCA-1,102,50,Cortex_2,1,Cortex_3,8.502891406705377,9.869983,8.257904,19340.0,4928,43.13340227507756,49.21406411582213,60.449844881075485,38.42812823164426,19.943123,19340.0,3857.0
74+
AAACAATCTACTAGCA-1,43,3,Cortex_5,1,Pyramidal_layer_dentate_gyrus,8.145839612936841,9.528867,8.091933,13750.0,3448,55.14181818181818,60.95272727272727,70.57454545454546,50.516363636363636,23.76,13750.0,3267.0
75+
AAACACCAATAACTGC-1,19,59,Thalamus_2,1,Hypothalamus_1,8.70334075304372,10.395467,8.499233,32710.0,6022,47.071232039131765,54.56435340874351,65.0871293182513,40.48303271170896,15.010699,32710.0,4910.0
76+
AAACAGAGCGACTCCT-1,94,14,Cortex_5,1,Pyramidal_layer_dentate_gyrus,8.369157112588834,9.674704,8.092851,15909.0,4311,45.81054748884279,52.07744044251681,62.97693129675027,40.95794833113332,20.554403,15909.0,3270.0
77+
AAACCGGGTAGGTACC-1,28,42,Thalamus_2,1,Hypothalamus_1,8.663542087751374,10.369013,8.808967,31856.0,5787,45.887744851833254,52.98216976393771,64.24849321948768,40.287543947764945,21.01017,31856.0,6693.0
78+
AAACCGTTCGTCCAGG-1,42,52,Hypothalamus_2,1,Pyramidal_layer,8.682538124003075,10.337314,8.559678,30862.0,5898,43.79171797031949,51.18592443781998,62.65634113148856,37.80053139783553,16.901043,30862.0,5216.0
79+
AAACCTCATGAAGTTG-1,19,37,Thalamus_2,1,Hypothalamus_1,9.027858802380862,11.007419,8.849371,60319.0,8331,34.28770370861586,42.45594257199224,55.48997828213332,27.803842901904872,11.553574,60319.0,6969.0
80+
AAACGAAGAACATACC-1,64,6,Cortex_4,1,Hypothalamus_2,8.84246002419529,10.578089,8.855521,39264.0,6921,37.99663814180929,44.75346373268134,56.6320293398533,32.95639771801141,17.858597,39264.0,7012.0
81+
AAACGAGACGGTTGAT-1,79,35,Fiber_tract,1,Cortex_5,8.80941494391005,10.458923,8.351847,34853.0,6696,39.947780678851174,47.52818982583996,58.838550483459095,33.7245000430379,12.156773,34853.0,4237.0
82+
AAACGGTTGCGAACTG-1,59,67,Lateral_ventricle,1,Striatum,8.718663567048953,10.254004,8.416489,28395.0,6115,41.67635147032928,49.20232435287903,60.556435992252155,35.562599049128366,15.918295,28395.0,4520.0
83+
```
84+
85+
Export cell metadata to a CSV file:
86+
```bash
87+
uv run h5ad export dataframe visium.h5ad obs --output cells.csv
88+
wc -l cells.csv # 2689 cells.csv
89+
```
90+
91+
## 4 Subset by names
92+
93+
Let's get all cluster names from `cells.csv`:
94+
```bash
95+
awk -F ',' 'NR>1{print $4}' cells.csv | sort | uniq -c
96+
```
97+
```
98+
284 Cortex_1
99+
257 Cortex_2
100+
244 Cortex_3
101+
164 Cortex_4
102+
129 Cortex_5
103+
226 Fiber_tract
104+
222 Hippocampus
105+
208 Hypothalamus_1
106+
133 Hypothalamus_2
107+
105 Lateral_ventricle
108+
42 Pyramidal_layer
109+
68 Pyramidal_layer_dentate_gyrus
110+
153 Striatum
111+
261 Thalamus_1
112+
192 Thalamus_2
113+
```
114+
115+
To get all obs names in "Cortex_2", you can use `csvsql` from `csvkit`:
116+
```bash
117+
csvsql -d ',' -I --query "SELECT _index FROM cells WHERE cluster='Cortex_2'" cells.csv > barcodes.txt
118+
sed -i '1d' barcodes.txt # remove header
119+
wc -l barcodes.txt # 257 barcodes.txt
120+
```
121+
122+
Now you can use this list to create a subset `.h5ad` file:
123+
```bash
124+
uv run h5ad subset visium.h5ad cortex2.h5ad --obs barcodes.txt
125+
```
126+
127+
Check the result:
128+
```bash
129+
uv run h5ad info cortex2.h5ad
130+
```
131+
132+
## Import or replace data
133+
You can also import new data into an existing store. For example, let's replace the `obs` dataframe with a modified version. First, leave only first 5 columns in `cells.csv`:
134+
```bash
135+
cut -d ',' -f 1-5 cells.csv > cells1to5.csv
136+
```
137+
138+
Now import it back into `cortex2.h5ad`:
139+
```bash
140+
uv run h5ad import dataframe visium.h5ad obs cells1to5.csv
141+
```
142+
143+
Check the updated `obs` structure:
144+
```bash
145+
uv run h5ad info visium.h5ad obs
146+
```

0 commit comments

Comments
 (0)