AICelltype is an open, intelligent, and efficient cell type annotation framework powered by Large Language Models (LLMs). With the explosive growth of single-cell RNA sequencing (scRNA-seq) data, accurate and scalable cell type annotation has become a pressing challenge. Existing tools often suffer from limited generalization, heavy reliance on human expertise, high computational costs, and a lack of flexibility across tissues and species.
To address this, we systematically evaluated 79 state-of-the-art LLMs under different conditions (temperature, noise, and prompt formats), and developed an optimized annotation framework that:
📊 How it works ? (click to expand)
(A) Cell type identification leveraging large language models based on marker gene information. (B) Evaluation of annotation accuracy and robustness across different language models, temperature settings, and noise conditions; an optimized model was selected using a cell-type matching scoring system. (C) Integration of AIcelltype with standard Seurat analysis pipelines, enabling users to perform online cell type annotation and visualization through an open platform and OpenRouter interface. The platform supports flexible applications across multiple species and tissue types.
📊 Benchmark of accuracy for cell type annotation in large language models
✅ Free online annotation service: No registration or API keys required.
🧠 Leverages both open-source and commercial LLMs, avoiding black-box APIs if needed.
🔁 Supports Seurat-native workflows for easy integration into existing pipelines.
🌍 Enables cross-species and multi-tissue annotation with customizable prompts and scoring logic.
🌐 Web access: Try it now at 👉 https://AICellType.jinlab.online
💸 Provides a cost-effective and fully open platform through OpenRouter and GitHub distribution.
⚙️ Self-hosting with customizable base URLs: Use your own LLM backend (e.g., local server, proxy API) via the baseurl parameter for full control and data privacy.
Whether you're working with human PBMC, mouse brain, or other complex tissue types, AICellType offers a robust, extensible solution to empower your single-cell analysis with AI-enhanced annotation.
library(devtools)
devtools::install_github("mooerccx/AICellType")
wget https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
tar zvxf pbmc3k_filtered_gene_bc_matrices.tar.gz
library(dplyr)
library(Seurat)
library(patchwork)
library(AICellType)
# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "/brahms/mollag/practice/filtered_gene_bc_matrices/hg19/")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
## An object of class Seurat
## 13714 features across 2700 samples within 1 assay
## Active assay: RNA (13714 features, 0 variable features)
## 1 layer present: counts
# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
#
# Example one directly passes a Seurat object.
# If you have your own API, please fill in the corresponding parameters; leave them blank if not.
pbmc <- AnnotateCelltype(scRNA=pbmc, tissuename="PBMC")
#
# Example two first obtains the top 10 marker genes and then performs annotation operations.
# If you have your own API, please fill in the corresponding parameters; leave them blank if not.
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)
pbmc.markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 10) %>%
ungroup() -> top10
MarkerGenes <- SeuratMarkerGeneToStr(top10)
celltype <- GetCellType(markergenes=MarkerGenes, tissuename="PBMC")
new.cluster.ids <- unname(unlist(celltype$content))
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
pbmc <- RunUMAP
By default, Claude 3.5 Sonnet (0624) is used for free. To use your own LLM:
pbmc <- AnnotateCelltype(
scRNA = pbmc,
tissuename = "PBMC",
baseurl = "https://openrouter.ai/api/v1/chat/completions",
model = "openai/gpt-4",
key ="your-key"
)'
baseurl: Custom API endpoint (e.g. OpenRouter, Ollama, local LLM server)
model: Any supported LLM model name (e.g. meta-llama/llama-3-70b-instruct)
You can pass additional biological context to the LLM by customizing the tissuename:
pbmc <- AnnotateCelltype(
scRNA = pbmc,
tissuename = "PBMC,Isolated from dog infected with the virus"
)
The more specific the context, the better the model can match relevant cell types.