Single Cell data refers to molecular measurements obtained from individual cells, rather than bulk samples where signals are averaged across many cells. This approach allows researchers to study the heterogeneity within a cell population, uncovering differences in gene expression, epigenetic states, or protein abundance between cells.
ODM now supports the Cell entity to store and manage metadata and expression for individual cells in Single Cell datasets. Each cell record belongs to a Cell Group, which represents a single cell table (group).
Cell metadata can be imported into ODM using the job endpoints and odm_import_data script.
Only TSV file format is supported to upload cell metadata.
Let's upload a new Study with Samples, Cell metadata, and Cell expression. For data import, you should go to the job
section and choose the endpoint relevant for the specific data type.
In this example we will upload the following files:
Study_metadata, a tab-delimited file of the study attributes:
| Study Source | Study Source ID | Study Title |
|---|---|---|
| S3 | EXP_S_9988 | Single Cell Expression Data Search |
Import study as described here.
Samples_metadata, a tab-delimited file of sample attributes:
| Sample Name | Sample Source ID | Sample Source | Sex | Age | Cell Type | Disease |
|---|---|---|---|---|---|---|
| EXP_SN_8801 | EXP_SSID_8801 | S3 | female | 28 | EXP_CT_8801 | diabetes |
| EXP_SN_8802 | EXP_SSID_8802 | S3 | male | 29 | EXP_CT_8802 | melanoma |
| ... | ... | ... | ... | ... | ... | ... |
Import samples as described here.
Cell_metadata, a tab-delimited file of cell attributes:
| barcode | sample_id | cell_type | treatment | protocol | cluster | n_counts | percent_mito | umap | pca | n_genes | doublet_scores | donor | organ | sort | method | file | assay | disease | organism | sex | development_stage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SMPL_CID_A1 01 | EXP_SSID_8801 | CD4_T_cell | stimulated | Smart-seq2 | Activated T cells | 12500 | 0.8 | -1.2,2.5 | 1.8,-0.7 | 2800 | 0.05 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A101 | Smart-seq2 | healthy | Homo sapiens | female | adult |
| SMPL_CID_A102 | EXP_SSID_8802 | NK_cell | resting | Smart-seq2 | Resting NK_cells | 8900 | 1.1 | 2.3,-1.8 | -0.9,2.1 | 2100 | 0.08 | DONOR_A | blood | FACS_A | scRNA | SampleFile_A102 | Smart-seq2 | healthy | Homo sapiens | male | adult |
| SMPL_CID_A103 | EXP_SSID_8803 | CD4_T_cell | stimulated | Smart-seq2 | Memory T cells | 15200 | 0.9 | -2.1,1.7 | 0.6,-1.9 | 3200 | 0.04 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A103 | Smart-seq2 | healthy | Homo sapiens | female | adult |
| SMPL_CID_A104 | EXP_SSID_8804 | CD8_T_cell | cytotoxic | Smart-seq2 | Cytotoxic T cells | 11800 | 1.2 | 1.9,-2.4 | -1.5,0.8 | 2900 | 0.07 | DONOR_A | blood | FACS_A | scRNA | SampleFile_A104 | Smart-seq2 | healthy | Homo sapiens | male | adult |
| SMPL_CID_A105 | EXP_SSID_8805 | CD8_T_cell | resting | Smart-seq2 | Naive CD8_T_cells | 9300 | 1.0 | -0.8,1.3 | 2.2,-1.1 | 2500 | 0.06 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A105 | Smart-seq2 | healthy | Homo sapiens | female | adult |
For Cell metadata use the following endpoints:
-
Supply the file URL via dataLink
Path: POST
/api/v1/jobs/import/cells -
Upload directly from TSV file
Path: POST
/api/v1/jobs/import/cells/multipart
Import Cell metadata as described here.
Cell_expression, a tab-delimited file of cell expression data:
| gene_id | SMPL_CID_A101 | SMPL_CID_A102 | SMPL_CID_A103 | SMPL_CID_A104 | SMPL_CID_A105 |
|---|---|---|---|---|---|
| ENSG00000230368 | 1.01 | 1.02 | 1.03 | 1.04 | 1.05 |
| ENSG00000188976 | 2.01 | 2.02 | 2.03 | 2.04 | 2.05 |
| ACTB | 3.01 | 3.02 | 3.03 | 3.04 | 3.05 |
For Cell expression use the following endpoints:
-
Supply the file URL via dataLink
Path: POST
/api/v1/jobs/import/expression -
Upload directly from TSV file
Path: POST
/api/v1/jobs/import/expression/multipartIt is recommended to use TSV files archived in
.bror.lz4extensions for Cell expression.
When the import job finishes successfully, the resulting Group accession can be retrieved with the following endpoint:
GET /api/v1/jobs/{jobExecId}/output.
Example response:
{
"groupAccession": "GSF1234567"
}Learn more about uploading data to ODM via API here.
Curators can upload and link Cell metadata groups to ODM using the odm_import_data script. This extension allows you to include Cell groups in the same import workflow as other metadata entities (Studies, Samples, Libraries, and Preparations), ensuring a consistent and automated data-loading process.
The script supports optional parameter for Cell metadata: -c --cell
| Feature | Description |
|---|---|
| Parameter | --cell / -c |
| Input format | TSV (same format as /api/v1/jobs/import/cells) |
| Linking targets | Samples, Libraries, or Preparations |
| Multiple imports | Supported in one run |
| Error handling | Aligned with Cell import endpoint |
For uploading Cell expression please use regular -e --expression parameters.
Cells can be imported and linked in several hierarchical contexts, depending on your dataset structure. There are few examples:
-
Study → Samples → Cells → Expression
Used when cells are directly associated with samples.
-
Study → Samples → Library → Cells → Expression / Study → Samples → Preparation → Library → Cells → Expression
Used when cells originate from library-level data.
-
Study → Samples → Preparations → Cells → Expression / Study → Samples → Library → Preparation → Cells → Expression
Used when cells originate from preparation-level data.
Note that Cell metadata will be linked to the nearest metadata group mentioned above in the script.
odm-import-data \
--server <HOST> \
--token <TOKEN> \
--study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \
--samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \
--cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \
--expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \
--data-class 'Single-cell transcriptomics' \
--number-of-feature-attributes 1 \
--allow-duplicates
There is the list of values parsed and stored within the system.
All other values presented in Cell metadata file will be stored as custom attributes with string data type.
| Attribute Name | Stored as type | Description | Required |
|---|---|---|---|
| cellID | string | Unique cell identifier generated by ODM (composite key of groupAccession + barcode) |
Yes |
| barcode | string | Raw cell barcode. Must be unique. | Yes |
| batch | string | Sample/batch origin | Yes |
| cellType | string | Annotated cell type | |
| cluster | string | Clustering labels | |
| nCounts | integer | Total UMI count (Unique Molecular Identifier) | |
| percentMito | float | % mitochondrial gene expression | |
| umap | float | Dimensionality reduction results (Uniform Manifold Approximation and Projection). Up to 3 values are stored. | |
| pca | float | Dimensionality reduction results (Principal Component Analysis results). Up to 100 values are stored. | |
| tsne | float | Dimensionality reduction results (t-distributed Stochastic Neighbor Embedding). Up to 3 values are stored. |
Fail conditions:
- Missing required attributes (
barcode,batch) - Duplicate barcodes within a group
- Blank values in required attributes
Warnings (ignored values):
- Invalid data type for attribute
To link Cell metadata to other metadata groups use the following endpoints:
Swagger definition: integrationCurator → Cell integration as Curator
-
Link to Samples
Path: POST
/api/v1/as-curator/integration/link/cell/group/{sourceId}/to/sample/group/{targetId} -
Link to Libraries
Path: POST
/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/library/group/{targetId} -
Link to Preparations
Path: POST
/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/preparation/group/{targetId}
For sourceId field provide accession of your Cell metadata group.
For targetId field provide accession of selected Sample, Library, or Preparation group where Cell metadata should be linked.
Cell metadata will be linked if there are matches between batch values in Cell metadata and Sample Source ID for Samples,
Library ID for Libraries, and Preparation ID for Preparations.
Fail conditions:
- There is no Sample Source/Library/Preparation ID in Sample/Library/Preparation metadata group.
- There are no matches between
batchin Cell metadata and Sample Source/Library/Preparation IDs.
The amount of successfully created links between Cells and Samples/Libraries/Preparations will be shown in response message if linkage is successful.
To link Cell expression to Cell metadata group use the following endpoint:
Swagger definition: integrationCurator → Expression integration as Curator
Path: POST /api/v1/as-curator/integration/link/expression/group/{sourceId}/to/cell/group/{targetId}
For sourceId field provide accession of your Cell expression group.
For targetId field provide accession of selected Cell metadata group which Cell expression should be linked to.
A Cell expression group can be linked to one Cell metadata group only.
Compute cell ratio statistics across groups or metadata attributes in single-cell data.
This endpoint calculates cell ratio statistics based on single-cell metadata.
It quantifies the proportion of cells that meet specific criteria (countSelected, e.g., expression
threshold, cell type, or cluster) relative to a defined reference group or the total cell population
(countAvailable) defined by study, samples, library, or preparation metadata.
Swagger definition: integrationCurator → [BETA] Analytics omics queries as Curator
Path: POST /api/v1/as-curator/omics/cells/analytics/cell-ratio
The Cell Ratio endpoint computes a simple proportion:
countSelected= number of cells that match all provided criteria (study/sample/library/preparation + cell metadata + optional expression constraints)countAvailable= number of cells in the reference population defined only by study/sample/library/preparation queries & filtersratio=countSelected/countAvailable
This endpoint returns counters only (no cell records).
Use it when you want to answer questions like:
- “What fraction of cells in
Study XareMonocytes?” - “Within samples matching
Clozapine, what proportion of cells have expression in a given range?” - “Among cells from a specific library/preparation, what fraction match a cell metadata definition?”
Request example:
{
"cellGroup": {
"studyFilter": "\"Study Source\"=ArrayExpress",
"studyQuery": "RNA-Seq of human dendritic cells",
"sampleFilter": "\"Species or strain\"=\"Homo sapiens\"",
"sampleQuery": "Clozapine",
"libraryFilter": "\"Library Type\"=RNA-Seq-1",
"libraryQuery": "illumina HiSeq500",
"preparationFilter": "Digestion=Trypsin",
"preparationQuery": "reversed-phase liquid chromatography",
"cellQuery": "cellType=Macrophage,Monocyte",
"searchSpecificTerms": false
},
"exQuery": "-3 < value < 3"
}Response example:
{
"countSelected": 1243393,
"countAvailable": 9234945,
"ratio": 0.13465
}The Gene Summary endpoint returns descriptive statistics and distribution summaries for expression values of up to 100 genes across a filtered set of single cells.
You use it when you want quick “what does this gene look like in these cells?” metrics: mean/median, spread, quantiles, min/max, and a histogram-style density summary.
Swagger definition: integrationCurator → [BETA] Analytics omics queries as Curator
Path: POST /api/v1/as-curator/omics/cells/analytics/gene-summary
For each requested gene, the response includes:
geneId: gene identifier (e.g., Ensembl ID)cellCount: number of cells with measurable expression for the gene under the applied filtersmean: average expression valuemedian: median expression valuestdDev: standard deviation (dispersion)min/max: observed range of expression valuesquantiles: expression percentiles (configurable set of percentiles; returned as an ordered list of values)histogram(density): binned distribution summary suitable for plotting expression density
Request example:
{
"cellGroup": {
"studyFilter": "\"Study Source\"=ArrayExpress",
"studyQuery": "RNA-Seq of human dendritic cells",
"sampleFilter": "\"Species or strain\"=\"Homo sapiens\"",
"sampleQuery": "Clozapine",
"libraryFilter": "\"Library Type\"=RNA-Seq-1",
"libraryQuery": "illumina HiSeq500",
"preparationFilter": "Digestion=Trypsin",
"preparationQuery": "reversed-phase liquid chromatography",
"cellQuery": "cellType=Macrophage,Monocyte",
"searchSpecificTerms": false
},
"geneNames": [
"ENSG00000230368",
"ENSG00000188976",
"ENSG00000188982"
],
"exQuery": "-3 < value < 3"
}Response example:
{
"resultsPerGene": [
{
"geneId": "ENSG00000111640",
"cellCount": 8968167,
"mean": 7.747614311820911,
"median": 7,
"stdDev": 6.499314669429827,
"min": 1,
"max": 496,
"quantiles": [
1,
1,
2,
3,
5,
7,
10,
12,
15,
27,
192
],
"histogram": "[(1, 15.50289002318, 7686678.375), (15.50289002318, 35.49570418233824, 1229164),\n(35.49570418233824, 56.93121325335453, 36531.25), (56.93121325335453, 77.21467372919479, 6910.625)]\n"
}
]
}The Differential Expression endpoint compares gene expression between two cell populations:
a Case group and a Control group. It returns per-gene metrics that quantify how strongly expression
differs between the two groups, including fold change and Mann–Whitney U test results.
Swagger definition: integrationCurator → [BETA] Analytics omics queries as Curator
Path: POST /api/v1/as-curator/omics/cells/analytics/differential-expression
Use it to answer questions like:
- “Which genes are upregulated in
Monocytesvs all other cells?” - “Which genes differ between case samples and control samples within the same study?”
- “What changes under a treatment condition vs untreated controls?”
Calculations for each returned geneId:
caseCellCount: number of case cells contributing measurable expression for that genecontrolCellCount: number of control cells contributing measurable expression for that genecaseAvgEx: mean expression across contributing case cellscontrolAvgEx: mean expression across contributing control cellsexpressionDifference:caseAvgEx-controlAvgExfoldChange:caseAvgEx/controlAvgExmannWhitneyU/pValue: Mann–Whitney U test outputs (as implemented by ClickHouse mannwhitneyutest)log2FC: the fold change expressed on a base-2 logarithmic scale
If you apply exQuery expression thresholds, only cells/expression values that satisfy those rules contribute to the counts and averages.
Request example:
{
"caseGroup": {
"studyFilter": "\"Study Source\"=ArrayExpress",
"studyQuery": "RNA-Seq of human dendritic cells",
"sampleFilter": "\"Species or strain\"=\"Homo sapiens\"",
"sampleQuery": "Clozapine",
"libraryFilter": "\"Library Type\"=RNA-Seq-1",
"libraryQuery": "illumina HiSeq500",
"preparationFilter": "Digestion=Trypsin",
"preparationQuery": "reversed-phase liquid chromatography",
"cellQuery": "cellType=Macrophage,Monocyte",
"searchSpecificTerms": false
},
"controlGroup": {
"studyFilter": "\"Study Source\"=ArrayExpress",
"studyQuery": "RNA-Seq of human dendritic cells",
"sampleFilter": "\"Species or strain\"=\"Homo sapiens\"",
"sampleQuery": "Clozapine",
"libraryFilter": "\"Library Type\"=RNA-Seq-1",
"libraryQuery": "illumina HiSeq500",
"preparationFilter": "Digestion=Trypsin",
"preparationQuery": "reversed-phase liquid chromatography",
"cellQuery": "cellType=Macrophage,Monocyte",
"searchSpecificTerms": false
},
"exQuery": "feature=ENSG00000230368,ENSG00000188976",
"limit": 2000,
"offset": 0
}Response example:
{
"resultsPerGene": [
{
"geneId": "ENSG00000230368",
"caseCellCount": 8450,
"controlCellCount": 8123,
"caseAvgExpression": 1.24,
"controlAvgExpression": 0.62,
"expressionDifference": 0.62,
"foldChange": 2,
"mannWhitneyU": 1.5,
"pValue": 0.95
}
],
"pagination": {
"currentResultsCount": 1,
"limit": 2000,
"offset": 0
}
}Please use manage-data/data endpoint to delete Cell metadata or Cell expression group.