Skip to content

Single-cell RNA-seq analysis of mouse gastrulation using data from Argelaguet and Pijuan-Sala. The project includes preprocessing, SCVI-based dataset integration, differential gene expression analysis, and a no-code interface powered by Chatmol for natural language interaction with the pipeline.

Notifications You must be signed in to change notification settings

orestasdulinskas/gastrulation-scRNAseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Exploring the Transcriptomic Landscape of Mouse Gastrulation

University of Bristol – MSc Data Science 2025
Group Project: Orestas Dulinskas, Adrian Dinulescu, Elena Bettison, Elizabeth Williams

🧠 Overview

This project investigates the transcriptional dynamics of mouse gastrulation using Single-Cell RNA-Sequencing (scRNA-seq) data from two public datasets. It aims to:

  • Uncover the gene expression changes that drive early embryonic development
  • Integrate datasets from different technologies using SCVI
  • Identify differentially expressed genes (DEGs) across time points
  • Provide a no-code interface using Chatmol, enabling domain experts to run the pipeline using natural language

📄 Full Technical Report

👉 Read the Full Report

Covers:

  • Biological background and motivation
  • Data acquisition and preprocessing
  • Dimensionality reduction and batch correction
  • Differential expression analysis
  • Chatmol integration and interface testing
  • Results, insights, and future work

📁 Repository Structure

Chatmol/main.py

This Python script integrates the project's core functionality into the Chatmol framework. It defines callable functions for:

  • Data preprocessing
  • Exploratory dimensionality reduction
  • Dataset integration using SCVI
  • Differential gene expression (DEG) analysis

Chatmol interprets natural language prompts and triggers these functions.

More about this in technical report


preprocessing/

Contains four Jupyter notebooks, one from each team member. Each notebook performs:

  • Dataset-specific quality control (mitochondrial content, UMI thresholds, doublet detection)
  • Normalization and log transformation
  • Highly variable gene (HVG) selection
  • UMAP visualizations of raw batches

These notebooks explore different subsets of the data (e.g. ARG, PJ1, PJ2) and help ensure robustness across preprocessing workflows.


scvi_integration/

Also contains four Jupyter notebooks, each focusing on:

  • Concatenating preprocessed datasets
  • Configuring and training SCVI (Single-Cell Variational Inference)
  • Removing batch effects
  • Comparing latent space representations (UMAPs)
  • Performing DEG analysis between stages

Each notebook explores different SCVI configurations (e.g., dispersion settings, likelihood functions, latent dimensions).


🔬 Data Sources

Dataset Study Method Stages Cells
ARG Argelaguet et al. (2019) Plate-based (Smart-seq2) E4.5–E7.5 ~2,500
PJ Pijuan-Sala et al. (2019) Droplet-based (10x) E6.5–E8.5 ~116,000

Datasets were quality-checked and annotated with metadata including embryonic day and cell type labels.


🛠️ Tools & Technologies

  • Python (ScanPy, AnnData, Seaborn, Matplotlib)
  • scVI-tools (GPU-accelerated batch integration)
  • Scrublet (doublet detection)
  • Chatmol (LLM interface)
  • llama 3.2 via Ollama for local LLM inference
  • Google Colab / Kaggle for GPU experimentation

📈 Key Results

  • UMAP visualizations show clear lineage progression from epiblast to germ layers
  • SCVI successfully removes batch effects across protocols (Smart-seq2 vs 10x)
  • DEGs identified at E4.5–E7.5 align with biological transitions: pre-gastrulation, streak formation, and germ layer specification
  • Chatmol allows users to run the entire pipeline with commands like:

    "Perform exploratory analysis on the dataset"


🤖 Chatmol Integration

Chatmol enables non-coders (e.g., lab biologists) to analyze scRNA-seq data via natural language. This project adds support for:

  • preprocess the data
  • perform exploratory analysis
  • integrate batches using SCVI
  • analyze differentially expressed genes

The assistant runs selected functions and returns figures or tables without code from the user.


📌 Limitations & Future Improvements

  • Migrate from SCVI to scANVI for semi-supervised integration (leverages known labels)
  • Switch Chatmol's function calling to Model Context Protocol (MCP) for broader LLM support
  • Add Chatmol features for:
    • Custom DEG comparisons
    • Marker gene lookups
    • Cluster-specific visualizations

👥 Contributors

Name
Orestas Dulinskas
Adrian Dinulescu
Elena Bettison
Elizabeth Williams

📚 References

  • Argelaguet et al., 2019. Nature
  • Pijuan-Sala et al., 2019. Nature

About

Single-cell RNA-seq analysis of mouse gastrulation using data from Argelaguet and Pijuan-Sala. The project includes preprocessing, SCVI-based dataset integration, differential gene expression analysis, and a no-code interface powered by Chatmol for natural language interaction with the pipeline.

Topics

Resources

Stars

Watchers

Forks

Contributors