| name | description |
|---|---|
tooluniverse-spatial-transcriptomics |
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis. |
Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.
Triggers:
- User has spatial transcriptomics data (Visium, MERFISH, seqFISH, etc.)
- Questions about tissue architecture or spatial organization
- Spatial gene expression pattern analysis
- Cell-cell proximity or neighborhood analysis requests
- Tumor microenvironment spatial structure questions
- Integration of spatial with single-cell data
- Spatial domain identification
- Tissue morphology correlation with expression
Example Questions This Skill Solves:
- "Analyze this 10x Visium dataset to identify spatial domains"
- "Which genes show spatially variable expression in this tissue?"
- "Map the tumor microenvironment spatial organization"
- "Find genes enriched at tissue boundaries"
- "Identify cell-cell interactions based on spatial proximity"
- "Integrate spatial transcriptomics with scRNA-seq annotations"
- "Characterize spatial gradients in gene expression"
- "Map ligand-receptor pairs in tissue context"
| Capability | Description |
|---|---|
| Data Import | 10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats |
| Quality Control | Spot/cell QC, spatial alignment verification, tissue coverage |
| Normalization | Spatial-aware normalization accounting for tissue heterogeneity |
| Spatial Clustering | Identify spatial domains with similar expression profiles |
| Spatial Variable Genes | Find genes with non-random spatial patterns |
| Neighborhood Analysis | Cell-cell proximity, spatial neighborhoods, niche identification |
| Spatial Patterns | Gradients, boundaries, hotspots, expression waves |
| Integration | Merge with scRNA-seq for cell type mapping |
| Ligand-Receptor Spatial | Map cell communication in tissue context |
| Visualization | Spatial plots, heatmaps on tissue, 3D reconstruction |
Input: Spatial Transcriptomics Data + Tissue Image
|
v
Phase 1: Data Import & QC
|-- Load spatial coordinates + expression matrix
|-- Load tissue histology image
|-- Quality control per spot/cell
|-- Filter low-quality spots
|-- Align spatial coordinates to tissue
|
v
Phase 2: Preprocessing
|-- Normalization (spatial-aware methods)
|-- Highly variable gene selection
|-- Dimensionality reduction (PCA)
|-- Spatial lag smoothing (optional)
|
v
Phase 3: Spatial Clustering
|-- Identify spatial domains/regions
|-- Graph-based clustering with spatial constraints
|-- Annotate domains with marker genes
|-- Visualize domains on tissue
|
v
Phase 4: Spatial Variable Genes
|-- Test for spatial autocorrelation (Moran's I, Geary's C)
|-- Identify genes with spatial patterns
|-- Classify pattern types (gradient, hotspot, boundary)
|-- Rank by spatial significance
|
v
Phase 5: Neighborhood Analysis
|-- Define spatial neighborhoods (k-NN, radius)
|-- Calculate neighborhood composition
|-- Identify interaction zones
|-- Niche characterization
|
v
Phase 6: Integration with scRNA-seq
|-- Cell type deconvolution per spot
|-- Map cell types to spatial locations
|-- Predict cell type spatial distributions
|-- Validate with marker genes
|
v
Phase 7: Spatial Cell Communication
|-- Identify proximal cell type pairs
|-- Query ligand-receptor database (OmniPath)
|-- Score spatial interactions
|-- Map communication hotspots
|
v
Phase 8: Generate Spatial Report
|-- Tissue overview with domains
|-- Spatially variable genes
|-- Cell type spatial maps
|-- Interaction networks in tissue context
|-- 3D visualization (if applicable)
Objective: Load spatial data and assess quality.
Supported platforms:
10x Visium (most common):
- Spots: 55μm diameter, ~50 cells per spot
- Resolution: ~5,000-10,000 spots per capture area
- Data: Expression matrix + spatial coordinates + H&E image
MERFISH/seqFISH (imaging-based):
- Single-cell resolution
- Targeted gene panels (100-10,000 genes)
- Absolute coordinates per cell
Slide-seq/Slide-seqV2:
- 10μm bead resolution
- Genome-wide profiling
Xenium (10x single-cell spatial):
- Single-cell resolution
- Large gene panels (300+ genes)
- Subcellular resolution
Data loading (Visium):
def load_visium_data(data_dir):
"""
Load 10x Visium spatial transcriptomics data.
Expected structure:
data_dir/
├── filtered_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── spatial/
│ ├── tissue_positions_list.csv
│ ├── scalefactors_json.json
│ └── tissue_hires_image.png
Returns: AnnData object with spatial coordinates
"""
import scanpy as sc
import pandas as pd
# Load expression data
adata = sc.read_visium(data_dir)
# Spatial coordinates are in adata.obsm['spatial']
# Tissue image in adata.uns['spatial']
return adataQuality Control:
- Spot-level QC:
def spatial_qc(adata):
"""
Quality control for spatial transcriptomics data.
"""
import scanpy as sc
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
# Visualize QC metrics spatially
sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')
# Filter criteria
# - Min 200 genes per spot
# - Min 500 UMI counts per spot
# - Max mitochondrial content < 20%
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_cells(adata, min_counts=500)
# Mitochondrial filtering
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
adata = adata[adata.obs['pct_counts_mt'] < 20].copy()
return adata- Spatial alignment verification:
def verify_spatial_alignment(adata):
"""
Verify spatial coordinates align with tissue image.
"""
import matplotlib.pyplot as plt
# Plot spots on tissue image
fig, ax = plt.subplots(figsize=(10, 10))
# Tissue image
img = adata.uns['spatial']['tissue_hires_image']
ax.imshow(img)
# Overlay spot coordinates
coords = adata.obsm['spatial']
ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)
ax.set_title('Spatial Alignment Verification')
plt.axis('off')Objective: Normalize data accounting for spatial heterogeneity.
Normalization:
def normalize_spatial(adata):
"""
Normalize spatial transcriptomics data.
"""
import scanpy as sc
# Filter genes (min 3 spots)
sc.pp.filter_genes(adata, min_cells=3)
# Normalize to median total counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Log-transform
sc.pp.log1p(adata)
# Store raw counts
adata.raw = adata
return adataHighly variable genes:
def select_hvg_spatial(adata):
"""
Select highly variable genes for spatial analysis.
"""
import scanpy as sc
# Standard HVG selection
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
# Optionally: weight by spatial autocorrelation
# Genes with spatial patterns are more informative
return adataSpatial smoothing (optional):
def spatial_smooth(adata, radius=2):
"""
Smooth expression by averaging over spatial neighbors.
Useful for noisy data, but can blur boundaries.
"""
from sklearn.neighbors import NearestNeighbors
# Find spatial neighbors
coords = adata.obsm['spatial']
nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
# Smooth expression matrix
X_smooth = adata.X.copy()
for i in range(adata.n_obs):
neighbors = indices[i]
X_smooth[i] = adata.X[neighbors].mean(axis=0)
adata.layers['smoothed'] = X_smooth
return adataObjective: Identify spatial domains (regions with distinct expression).
Graph-based clustering with spatial constraints:
def spatial_clustering(adata, n_neighbors=6):
"""
Cluster spots into spatial domains.
Uses both expression similarity AND spatial proximity.
"""
import scanpy as sc
import squidpy as sq
# PCA for dimensionality reduction
sc.pp.pca(adata, n_comps=50)
# Build spatial neighbor graph
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)
# Clustering with spatial constraints
# Uses both PCA space and spatial graph
sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')
# Visualize domains on tissue
sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')
return adataDomain marker genes:
def find_domain_markers(adata):
"""
Identify marker genes for each spatial domain.
"""
import scanpy as sc
# Differential expression per domain
sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')
# Get top markers per domain
markers = sc.get.rank_genes_groups_df(adata, group=None)
return markersObjective: Find genes with non-random spatial patterns.
Moran's I (spatial autocorrelation):
def identify_spatial_genes(adata):
"""
Test for spatial autocorrelation using Moran's I.
Moran's I > 0: Positive spatial autocorrelation (clustering)
Moran's I ~ 0: Random spatial distribution
Moran's I < 0: Negative autocorrelation (checkerboard)
"""
import squidpy as sq
# Calculate Moran's I for all genes
sq.gr.spatial_autocorr(
adata,
mode='moran',
n_perms=100,
n_jobs=-1
)
# Results in adata.uns['moranI']
spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)
# Filter significant spatial genes (FDR < 0.05)
sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]
return sig_spatialSpatial pattern classification:
def classify_spatial_patterns(adata, spatial_genes):
"""
Classify types of spatial patterns.
Pattern types:
- Gradient: Smooth directional change
- Hotspot: Localized high expression
- Boundary: Expression at domain edges
- Periodic: Regular spacing
"""
patterns = {}
for gene in spatial_genes.index[:100]: # Top 100 spatial genes
# Get expression and coordinates
expr = adata[:, gene].X.toarray().flatten()
coords = adata.obsm['spatial']
# Detect pattern type
pattern_type = detect_pattern_type(expr, coords)
patterns[gene] = pattern_type
return patternsObjective: Analyze cell-cell proximity and spatial niches.
Define spatial neighborhoods:
def analyze_neighborhoods(adata, radius=150):
"""
Analyze spatial neighborhood composition.
For each spot, characterize its microenvironment.
"""
import squidpy as sq
# Calculate neighborhood enrichment
# Tests if cell types are enriched in proximity
sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')
# Visualize neighborhood enrichment
sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')
# Results: which domains are spatially proximal?
return adataInteraction zones:
def identify_interaction_zones(adata, domain_a, domain_b):
"""
Find boundary regions between two spatial domains.
These are hotspots for cell-cell interactions.
"""
# Get spots from each domain
spots_a = adata.obs['spatial_domain'] == domain_a
spots_b = adata.obs['spatial_domain'] == domain_b
# Find spots that neighbor the other domain
# (spots from A that have neighbors in B)
coords = adata.obsm['spatial']
from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=6)
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
interaction_spots = []
for i, spot_in_a in enumerate(spots_a):
if spot_in_a:
neighbors = indices[i]
if any(spots_b[neighbors]):
interaction_spots.append(i)
# Mark interaction zone
adata.obs['interaction_zone'] = False
adata.obs.loc[interaction_spots, 'interaction_zone'] = True
return adataObjective: Map cell types from scRNA-seq to spatial locations.
Cell type deconvolution:
def deconvolve_cell_types(adata_spatial, adata_sc):
"""
Predict cell type composition per spatial spot.
Uses scRNA-seq reference to deconvolve Visium spots.
Methods: Cell2location, Tangram, SPOTlight
"""
import cell2location
# Prepare single-cell reference
# Extract signature genes per cell type
cell_type_signatures = extract_signatures(adata_sc)
# Run cell2location
# Estimates cell type abundances per spot
mod = cell2location.models.Cell2location(
adata_spatial,
cell_state_df=cell_type_signatures
)
mod.train(max_epochs=30000)
# Add cell type proportions to adata_spatial
adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()
return adata_spatialSpatial cell type mapping:
def map_cell_types_spatial(adata):
"""
Visualize cell type spatial distributions.
"""
import scanpy as sc
# For each cell type, plot abundance on tissue
cell_types = adata.obsm['cell_type_fractions'].columns
for ct in cell_types:
sc.pl.spatial(
adata,
color=adata.obsm['cell_type_fractions'][ct],
title=f'{ct} Spatial Distribution'
)Objective: Map ligand-receptor interactions in tissue context.
Spatial proximity-based communication:
def spatial_cell_communication(adata):
"""
Identify cell-cell communication based on spatial proximity.
Requires:
- Cell type annotations (from deconvolution)
- Ligand-receptor database (OmniPath)
"""
import squidpy as sq
from tooluniverse import ToolUniverse
tu = ToolUniverse()
# Get ligand-receptor pairs from OmniPath
lr_pairs = tu.run_one_function({
"name": "OmniPath_get_ligand_receptor_interactions",
"arguments": {"partners": ""} # Get all pairs
})
# For each cell type pair that are spatially proximal
# Calculate interaction scores
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key='cell_type',
interactions=lr_pairs,
copy=False
)
# Visualize significant interactions
sq.pl.ligrec(adata, cluster_key='cell_type')
return adataCommunication hotspot mapping:
def map_communication_hotspots(adata, ligand, receptor):
"""
Map spatial locations of specific L-R interactions.
"""
import matplotlib.pyplot as plt
# Get ligand expression
ligand_expr = adata[:, ligand].X.toarray().flatten()
# Get receptor expression
receptor_expr = adata[:, receptor].X.toarray().flatten()
# Interaction score = ligand × receptor
interaction_score = ligand_expr * receptor_expr
# Add to adata
adata.obs[f'{ligand}_{receptor}_score'] = interaction_score
# Visualize on tissue
sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
title=f'{ligand}-{receptor} Interaction Hotspots')Generate comprehensive spatial report:
# Spatial Transcriptomics Analysis Report
## Dataset Summary
- **Platform**: 10x Visium
- **Tissue**: Breast cancer tumor section
- **Spots**: 3,562 (after QC filtering)
- **Genes**: 18,432 detected
- **Resolution**: 55μm spot diameter (~50 cells/spot)
## Quality Control
- **Mean genes per spot**: 3,245
- **Mean UMI counts**: 12,543
- **Mitochondrial content**: 8.2% average
- **Tissue coverage**: 85% of capture area
## Spatial Domains Identified
- **7 distinct spatial domains** detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)
## Top Marker Genes per Domain
### Domain 1 (Tumor Core)
- EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)
### Domain 2 (Invasive Margin)
- VIM, FN1, MMP2, SNAI2 (EMT signature)
### Domain 4 (Immune Infiltrate)
- CD3D, CD8A, CD4, PTPRC (T cell enriched)
- CD68, CD14 (macrophage enriched)
## Spatially Variable Genes
- **456 genes with significant spatial patterns** (Moran's I, FDR < 0.05)
### Top 10 Spatial Genes
1. **MKI67** (I=0.82) - Hotspot pattern in tumor core
2. **CD8A** (I=0.78) - Gradient from margin to stroma
3. **VIM** (I=0.75) - Boundary enrichment at invasive margin
4. **COL1A1** (I=0.71) - Stromal-specific expression
5. **EPCAM** (I=0.69) - Tumor region pattern
## Cell Type Deconvolution
Integration with scRNA-seq reference (Bassez et al. 2021)
### Cell Type Spatial Distributions
- **Tumor cells**: Concentrated in core, sparse at margin
- **T cells**: Enriched at invasive margin and infiltrate zones
- **CAFs**: Stromal region and invasive margin
- **Macrophages**: Scattered, enriched near necrosis
- **B cells**: Lymphoid aggregates (2% of tissue)
### Tumor Microenvironment Composition
- Tumor core: 85% tumor cells, 10% CAFs, 5% immune
- Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
- Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells
## Spatial Cell Communication
### Top L-R Interactions (Spatially Proximal)
1. **Tumor → T cell**: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
2. **CAF → Tumor**: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
3. **Macrophage → Tumor**: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling
### Interaction Zones
- **Tumor-Immune Interface**: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
- **Stromal-Tumor Interface**: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche
## Spatial Gradients
- **Hypoxia gradient**: HIF1A, VEGFA increase toward tumor core
- **Proliferation gradient**: MKI67, TOP2A decrease from core to margin
- **Immune gradient**: CD8A, GZMB peak at invasive margin
## Biological Interpretation
Spatial analysis reveals distinct tumor microenvironment organization:
1. **Tumor core**: Highly proliferative, hypoxic, immune-excluded
2. **Invasive margin**: Active EMT, high immune infiltration, checkpoint expression
3. **Stromal barrier**: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with
PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint
blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy
resistance at tumor-stroma interface.
## Clinical Relevance
- **Checkpoint inhibitor response**: High immune infiltration at margin suggests potential
- **Resistance mechanisms**: CAF barrier and TGF-β signaling
- **Biomarkers**: Spatial arrangement of immune cells more predictive than bulk tumor metrics| Skill | Used For | Phase |
|---|---|---|
tooluniverse-single-cell |
scRNA-seq reference for deconvolution | Phase 6 |
tooluniverse-single-cell (Phase 10) |
L-R database for communication | Phase 7 |
tooluniverse-gene-enrichment |
Pathway enrichment for spatial domains | Phase 3 |
tooluniverse-multi-omics-integration |
Integrate with other omics | Phase 8 |
Question: "Map the spatial organization of tumor, immune, and stromal cells"
Workflow:
- Load Visium data, QC and normalize
- Spatial clustering → 7 domains identified
- Cell type deconvolution using scRNA-seq reference
- Map cell type distributions spatially
- Identify interaction zones (tumor-immune, tumor-stroma)
- Analyze L-R interactions in each zone
- Report: Comprehensive TME spatial architecture
Question: "Identify spatial gene expression gradients in developing tissue"
Workflow:
- Load spatial data (e.g., mouse embryo)
- Identify spatially variable genes
- Classify gradient patterns (anterior-posterior, dorsal-ventral)
- Map morphogen expression (WNT, BMP, FGF)
- Correlate with cell fate markers
- Report: Developmental spatial patterns
Question: "Automatically segment brain tissue into anatomical regions"
Workflow:
- Load Visium mouse brain data
- Spatial clustering with high resolution
- Match domains to known brain regions (cortex, hippocampus, etc.)
- Identify region-specific marker genes
- Validate with Allen Brain Atlas
- Report: Automated brain region annotation
| Component | Requirement |
|---|---|
| Spots/cells | At least 500 spatial locations |
| QC | Filter low-quality spots, verify alignment |
| Spatial clustering | At least one method (graph-based or spatial) |
| Spatial genes | Moran's I or similar spatial test |
| Visualization | Spatial plots on tissue images |
| Report | Domains, spatial genes, visualizations |
- Resolution: Visium spots contain multiple cells (not single-cell)
- Gene coverage: Imaging methods have limited gene panels
- 3D structure: Most platforms are 2D sections
- Tissue quality: Requires well-preserved tissue for imaging
- Computational: Large datasets require significant memory
- Reference dependency: Deconvolution quality depends on scRNA-seq reference
Methods:
- Squidpy: https://doi.org/10.1038/s41592-021-01358-2
- Cell2location: https://doi.org/10.1038/s41587-021-01139-4
- SpatialDE: https://doi.org/10.1038/nmeth.4636
Platforms: