|
| 1 | +# Get Started |
| 2 | + |
| 3 | +This short walkthrough shows the basic workflow: inspect a store, export metadata, and write a subset. |
| 4 | + |
| 5 | +## 1 Install |
| 6 | + |
| 7 | +Using uv (recommended): |
| 8 | +```bash |
| 9 | +git clone https://github.com/cellgeni/h5ad-cli.git |
| 10 | +cd h5ad-cli |
| 11 | +uv sync |
| 12 | +``` |
| 13 | + |
| 14 | +With pip: |
| 15 | +```bash |
| 16 | +git clone https://github.com/cellgeni/h5ad-cli.git |
| 17 | +cd h5ad-cli |
| 18 | +pip install . |
| 19 | +``` |
| 20 | + |
| 21 | +Additionally, it might be useful to install `csvkit` for inspecting exported CSV files: |
| 22 | +```bash |
| 23 | +# with uv |
| 24 | +uv pip install csvkit |
| 25 | + |
| 26 | +# with pip |
| 27 | +pip install csvkit |
| 28 | +``` |
| 29 | + |
| 30 | +## 2 Inspect a files with `info` command |
| 31 | + |
| 32 | +Let's load an example `.h5ad` file: |
| 33 | +```bash |
| 34 | +wget -O visium.h5ad https://exampledata.scverse.org/squidpy/figshare/visium_hne_adata.h5ad |
| 35 | +``` |
| 36 | + |
| 37 | +Now run `info` to see the file structure: |
| 38 | +```bash |
| 39 | +uv run h5ad info visium.h5ad |
| 40 | +``` |
| 41 | +``` |
| 42 | +An object with n_obs × n_var: 2688 × 18078 |
| 43 | + obs: array_col, array_row, cluster, in_tissue, leiden, log1p_n_genes_by_counts, log1p_total_counts, log1p_total_counts_mt, n_counts, n_genes_by_counts, pct_counts_in_top_100_genes, pct_counts_in_top_200_genes, pct_counts_in_top_500_genes, |
| 44 | +pct_counts_in_top_50_genes, pct_counts_mt, total_counts, total_counts_mt |
| 45 | + var: feature_types, gene_ids, genome, highly_variable, highly_variable_rank, log1p_mean_counts, log1p_total_counts, mean_counts, means, mt, n_cells, n_cells_by_counts, pct_dropout_by_counts, total_counts, variances, variances_norm |
| 46 | + obsm: X_pca, X_umap, spatial |
| 47 | + varm: PCs |
| 48 | + obsp: connectivities, distances |
| 49 | + uns: cluster_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, spatial, umap |
| 50 | + raw: X, var |
| 51 | +``` |
| 52 | + |
| 53 | +To inspect a specific entry: |
| 54 | +```bash |
| 55 | +uv run h5ad info visium.h5ad obsm/X_pca |
| 56 | +``` |
| 57 | +``` |
| 58 | +Path: obsm/X_pca |
| 59 | +Type: dense-matrix |
| 60 | +Shape: (2688, 50) |
| 61 | +Dtype: float32 |
| 62 | +Details: Dense matrix 2688×50 (float32) |
| 63 | +``` |
| 64 | + |
| 65 | +## 3 Export entries |
| 66 | +View the first few lines of the `obs` dataframe: |
| 67 | + |
| 68 | +```bash |
| 69 | +uv run h5ad export dataframe visium.h5ad obs --head 10 |
| 70 | +``` |
| 71 | +```csv |
| 72 | +_index,array_col,array_row,cluster,in_tissue,leiden,log1p_n_genes_by_counts,log1p_total_counts,log1p_total_counts_mt,n_counts,n_genes_by_counts,pct_counts_in_top_100_genes,pct_counts_in_top_200_genes,pct_counts_in_top_500_genes,pct_counts_in_top_50_genes,pct_counts_mt,total_counts,total_counts_mt |
| 73 | +AAACAAGTATCTCCCA-1,102,50,Cortex_2,1,Cortex_3,8.502891406705377,9.869983,8.257904,19340.0,4928,43.13340227507756,49.21406411582213,60.449844881075485,38.42812823164426,19.943123,19340.0,3857.0 |
| 74 | +AAACAATCTACTAGCA-1,43,3,Cortex_5,1,Pyramidal_layer_dentate_gyrus,8.145839612936841,9.528867,8.091933,13750.0,3448,55.14181818181818,60.95272727272727,70.57454545454546,50.516363636363636,23.76,13750.0,3267.0 |
| 75 | +AAACACCAATAACTGC-1,19,59,Thalamus_2,1,Hypothalamus_1,8.70334075304372,10.395467,8.499233,32710.0,6022,47.071232039131765,54.56435340874351,65.0871293182513,40.48303271170896,15.010699,32710.0,4910.0 |
| 76 | +AAACAGAGCGACTCCT-1,94,14,Cortex_5,1,Pyramidal_layer_dentate_gyrus,8.369157112588834,9.674704,8.092851,15909.0,4311,45.81054748884279,52.07744044251681,62.97693129675027,40.95794833113332,20.554403,15909.0,3270.0 |
| 77 | +AAACCGGGTAGGTACC-1,28,42,Thalamus_2,1,Hypothalamus_1,8.663542087751374,10.369013,8.808967,31856.0,5787,45.887744851833254,52.98216976393771,64.24849321948768,40.287543947764945,21.01017,31856.0,6693.0 |
| 78 | +AAACCGTTCGTCCAGG-1,42,52,Hypothalamus_2,1,Pyramidal_layer,8.682538124003075,10.337314,8.559678,30862.0,5898,43.79171797031949,51.18592443781998,62.65634113148856,37.80053139783553,16.901043,30862.0,5216.0 |
| 79 | +AAACCTCATGAAGTTG-1,19,37,Thalamus_2,1,Hypothalamus_1,9.027858802380862,11.007419,8.849371,60319.0,8331,34.28770370861586,42.45594257199224,55.48997828213332,27.803842901904872,11.553574,60319.0,6969.0 |
| 80 | +AAACGAAGAACATACC-1,64,6,Cortex_4,1,Hypothalamus_2,8.84246002419529,10.578089,8.855521,39264.0,6921,37.99663814180929,44.75346373268134,56.6320293398533,32.95639771801141,17.858597,39264.0,7012.0 |
| 81 | +AAACGAGACGGTTGAT-1,79,35,Fiber_tract,1,Cortex_5,8.80941494391005,10.458923,8.351847,34853.0,6696,39.947780678851174,47.52818982583996,58.838550483459095,33.7245000430379,12.156773,34853.0,4237.0 |
| 82 | +AAACGGTTGCGAACTG-1,59,67,Lateral_ventricle,1,Striatum,8.718663567048953,10.254004,8.416489,28395.0,6115,41.67635147032928,49.20232435287903,60.556435992252155,35.562599049128366,15.918295,28395.0,4520.0 |
| 83 | +``` |
| 84 | + |
| 85 | +Export cell metadata to a CSV file: |
| 86 | +```bash |
| 87 | +uv run h5ad export dataframe visium.h5ad obs --output cells.csv |
| 88 | +wc -l cells.csv # 2689 cells.csv |
| 89 | +``` |
| 90 | + |
| 91 | +## 4 Subset by names |
| 92 | + |
| 93 | +Let's get all cluster names from `cells.csv`: |
| 94 | +```bash |
| 95 | +awk -F ',' 'NR>1{print $4}' cells.csv | sort | uniq -c |
| 96 | +``` |
| 97 | +``` |
| 98 | +284 Cortex_1 |
| 99 | +257 Cortex_2 |
| 100 | +244 Cortex_3 |
| 101 | +164 Cortex_4 |
| 102 | +129 Cortex_5 |
| 103 | +226 Fiber_tract |
| 104 | +222 Hippocampus |
| 105 | +208 Hypothalamus_1 |
| 106 | +133 Hypothalamus_2 |
| 107 | +105 Lateral_ventricle |
| 108 | +42 Pyramidal_layer |
| 109 | +68 Pyramidal_layer_dentate_gyrus |
| 110 | +153 Striatum |
| 111 | +261 Thalamus_1 |
| 112 | +192 Thalamus_2 |
| 113 | +``` |
| 114 | + |
| 115 | +To get all obs names in "Cortex_2", you can use `csvsql` from `csvkit`: |
| 116 | +```bash |
| 117 | +csvsql -d ',' -I --query "SELECT _index FROM cells WHERE cluster='Cortex_2'" cells.csv > barcodes.txt |
| 118 | +sed -i '1d' barcodes.txt # remove header |
| 119 | +wc -l barcodes.txt # 257 barcodes.txt |
| 120 | +``` |
| 121 | + |
| 122 | +Now you can use this list to create a subset `.h5ad` file: |
| 123 | +```bash |
| 124 | +uv run h5ad subset visium.h5ad cortex2.h5ad --obs barcodes.txt |
| 125 | +``` |
| 126 | + |
| 127 | +Check the result: |
| 128 | +```bash |
| 129 | +uv run h5ad info cortex2.h5ad |
| 130 | +``` |
| 131 | + |
| 132 | +## Import or replace data |
| 133 | +You can also import new data into an existing store. For example, let's replace the `obs` dataframe with a modified version. First, leave only first 5 columns in `cells.csv`: |
| 134 | +```bash |
| 135 | +cut -d ',' -f 1-5 cells.csv > cells1to5.csv |
| 136 | +``` |
| 137 | + |
| 138 | +Now import it back into `cortex2.h5ad`: |
| 139 | +```bash |
| 140 | +uv run h5ad import dataframe visium.h5ad obs cells1to5.csv |
| 141 | +``` |
| 142 | + |
| 143 | +Check the updated `obs` structure: |
| 144 | +```bash |
| 145 | +uv run h5ad info visium.h5ad obs |
| 146 | +``` |
0 commit comments