🧬 Exploring the Transcriptomic Landscape of Mouse Gastrulation

University of Bristol – MSc Data Science 2025
Group Project: Orestas Dulinskas, Adrian Dinulescu, Elena Bettison, Elizabeth Williams

🧠 Overview

This project investigates the transcriptional dynamics of mouse gastrulation using Single-Cell RNA-Sequencing (scRNA-seq) data from two public datasets. It aims to:

Uncover the gene expression changes that drive early embryonic development
Integrate datasets from different technologies using SCVI
Identify differentially expressed genes (DEGs) across time points
Provide a no-code interface using Chatmol, enabling domain experts to run the pipeline using natural language

📄 Full Technical Report

👉 Read the Full Report

Covers:

Biological background and motivation
Data acquisition and preprocessing
Dimensionality reduction and batch correction
Differential expression analysis
Chatmol integration and interface testing
Results, insights, and future work

📁 Repository Structure

`Chatmol/main.py`

This Python script integrates the project's core functionality into the Chatmol framework. It defines callable functions for:

Data preprocessing
Exploratory dimensionality reduction
Dataset integration using SCVI
Differential gene expression (DEG) analysis

Chatmol interprets natural language prompts and triggers these functions.

More about this in technical report

`preprocessing/`

Contains four Jupyter notebooks, one from each team member. Each notebook performs:

Dataset-specific quality control (mitochondrial content, UMI thresholds, doublet detection)
Normalization and log transformation
Highly variable gene (HVG) selection
UMAP visualizations of raw batches

These notebooks explore different subsets of the data (e.g. ARG, PJ1, PJ2) and help ensure robustness across preprocessing workflows.

`scvi_integration/`

Also contains four Jupyter notebooks, each focusing on:

Concatenating preprocessed datasets
Configuring and training SCVI (Single-Cell Variational Inference)
Removing batch effects
Comparing latent space representations (UMAPs)
Performing DEG analysis between stages

Each notebook explores different SCVI configurations (e.g., dispersion settings, likelihood functions, latent dimensions).

🔬 Data Sources

Dataset	Study	Method	Stages	Cells
ARG	Argelaguet et al. (2019)	Plate-based (Smart-seq2)	E4.5–E7.5	~2,500
PJ	Pijuan-Sala et al. (2019)	Droplet-based (10x)	E6.5–E8.5	~116,000

Datasets were quality-checked and annotated with metadata including embryonic day and cell type labels.

🛠️ Tools & Technologies

Python (ScanPy, AnnData, Seaborn, Matplotlib)
scVI-tools (GPU-accelerated batch integration)
Scrublet (doublet detection)
Chatmol (LLM interface)
llama 3.2 via Ollama for local LLM inference
Google Colab / Kaggle for GPU experimentation

📈 Key Results

UMAP visualizations show clear lineage progression from epiblast to germ layers
SCVI successfully removes batch effects across protocols (Smart-seq2 vs 10x)
DEGs identified at E4.5–E7.5 align with biological transitions: pre-gastrulation, streak formation, and germ layer specification
Chatmol allows users to run the entire pipeline with commands like:

"Perform exploratory analysis on the dataset"

🤖 Chatmol Integration

Chatmol enables non-coders (e.g., lab biologists) to analyze scRNA-seq data via natural language. This project adds support for:

preprocess the data
perform exploratory analysis
integrate batches using SCVI
analyze differentially expressed genes

The assistant runs selected functions and returns figures or tables without code from the user.

📌 Limitations & Future Improvements

Migrate from SCVI to scANVI for semi-supervised integration (leverages known labels)
Switch Chatmol's function calling to Model Context Protocol (MCP) for broader LLM support
Add Chatmol features for:
- Custom DEG comparisons
- Marker gene lookups
- Cluster-specific visualizations

👥 Contributors

Name
Orestas Dulinskas
Adrian Dinulescu
Elena Bettison
Elizabeth Williams

📚 References

Argelaguet et al., 2019. Nature
Pijuan-Sala et al., 2019. Nature

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
chatmol		chatmol
integration		integration
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
Technical_Report.pdf		Technical_Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Exploring the Transcriptomic Landscape of Mouse Gastrulation

🧠 Overview

📄 Full Technical Report

📁 Repository Structure

`Chatmol/main.py`

`preprocessing/`

`scvi_integration/`

🔬 Data Sources

🛠️ Tools & Technologies

📈 Key Results

🤖 Chatmol Integration

📌 Limitations & Future Improvements

👥 Contributors

📚 References

About

Uh oh!

Contributors

Uh oh!

Languages

orestasdulinskas/gastrulation-scRNAseq

Folders and files

Latest commit

History

Repository files navigation

🧬 Exploring the Transcriptomic Landscape of Mouse Gastrulation

🧠 Overview

📄 Full Technical Report

📁 Repository Structure

Chatmol/main.py

preprocessing/

scvi_integration/

🔬 Data Sources

🛠️ Tools & Technologies

📈 Key Results

🤖 Chatmol Integration

📌 Limitations & Future Improvements

👥 Contributors

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`Chatmol/main.py`

`preprocessing/`

`scvi_integration/`