This document provides information about the API endpoints discovered in the Cell Guide website that can be used to programmatically access cell type information.
The endpoints are configured in the config.json file in the root directory of the project. The current version ID for marker endpoints is 1743611056.
{
"base_url": "https://cellxgene.cziscience.com/cellguide/",
"api_validated_url": "https://cellguide.cellxgene.cziscience.com/validated_descriptions/",
"api_gpt_url": "https://cellguide.cellxgene.cziscience.com/gpt_descriptions/",
"api_markers_base": "https://cellguide.cellxgene.cziscience.com/",
"marker_id_version": "1743611056"
}These endpoints provide text descriptions of cell types.
https://cellguide.cellxgene.cziscience.com/validated_descriptions/{ONTOLOGY_ID}.json
Example:
https://cellguide.cellxgene.cziscience.com/validated_descriptions/CL_0000084.json
Response format:
{
"description": "T cells also known as T lymphocytes are a critical component of the adaptive immune system...",
"references": ["https://www.doi.org/10.1016/j.jaci.2022.10.011", "..."]
}https://cellguide.cellxgene.cziscience.com/gpt_descriptions/{ONTOLOGY_ID}.json
Example:
https://cellguide.cellxgene.cziscience.com/gpt_descriptions/CL_0000084.json
Response format: Direct string containing the description.
These endpoints provide marker gene information for cell types. Note that these use a version ID in the URL which may change over time.
https://cellguide.cellxgene.cziscience.com/{VERSION_ID}/computational_marker_genes/{ONTOLOGY_ID}.json
Example:
https://cellguide.cellxgene.cziscience.com/1743611056/computational_marker_genes/CL_0000084.json
Response format: Array of marker gene objects containing fields like symbol, name, specificity, etc.
https://cellguide.cellxgene.cziscience.com/{VERSION_ID}/canonical_marker_genes/{ONTOLOGY_ID}.json
Example:
https://cellguide.cellxgene.cziscience.com/1743611056/canonical_marker_genes/CL_0000084.json
Response format: Array of marker gene objects containing fields like symbol, name, and tissue information.
The marker gene endpoints include a version ID in the URL path which appears to be a timestamp or other versioning identifier. If you experience issues with these endpoints, the version may have been updated. Use the debug_api.py script to investigate current endpoint behavior.
Here are some example cell ontology IDs that can be used with these endpoints:
CL_0000084: T cellCL_0000236: B cellCL_0000094: GranulocyteCL_0000928: Activated CD4-negative, CD8-negative type I NK T cell
The cell_guide_scraper.py script provides a convenient way to access these endpoints programmatically. It handles fallbacks between API endpoints and HTML scraping when necessary.
from cell_guide_scraper import scrape_cell_data
# Get data for a T cell
cell_data = scrape_cell_data("CL_0000084")
# Access description
description = cell_data["description"]
description_source = cell_data["description_source"] # "validated", "gpt", or "html"
# Access markers
computational_markers = cell_data["markers"]["computational"]
canonical_markers = cell_data["markers"]["canonical"]
markers_source = cell_data["markers"]["markers_source"] # "api" or "html"