|
1 | 1 | --- |
2 | | -title: Machine Learning Scientist |
| 2 | +title: Machine Learning Scientist — Drug Discovery (Foundation Models) |
3 | 3 | layout: default |
4 | 4 | --- |
5 | 5 |
|
6 | | -## Virtual Cells in Drug Discovery |
| 6 | +London, UK • linkedin.com/in/ctr26 • github.com/ctr26 • Google Scholar |
| 7 | +*Focus:* Virtual cells, multi‑modal foundation models, omics + imaging, precision medicine |
7 | 8 |
|
8 | | -<p style="text-align: justify;"> |
9 | | -Machine Learning Scientist specialising in virtual cell development for drug discovery. Strong background in building multi-modal foundation models that integrate epigenomic, transcriptomic, and phenotypic data. Experienced in applying AI to biological problems across scales, from molecular interactions to whole-organism imaging, with a focus on practical applications in precision medicine. |
10 | | -</p> |
| 9 | +## Professional Summary |
| 10 | +Machine Learning Scientist specialising in **virtual cell** development for drug discovery. Built and deployed **multi‑modal foundation models** that integrate knowledge graphs, text, transcriptomic and phenotypic imaging data. Experience spans molecular interactions to whole‑organism imaging. Comfortable leading across science and engineering—MLOps at TB‑scale, reproducible pipelines, and cross‑functional collaboration with biology, chemistry and platform teams. |
11 | 11 |
|
12 | | -## Employment History |
| 12 | +## Core Strengths |
| 13 | +Foundation models • Representation/self‑supervised learning • Multi‑modal fusion • Knowledge graphs • Biological sequence & transcriptomics • High‑content imaging • GNNs • OOD/robustness • Scaling & performance • Reproducible ML (MLOps) • Scientific communication |
13 | 14 |
|
14 | | -### Senior Machine Learning Scientist |
15 | | -**Valence Labs @ Recursion Pharmaceuticals** | London, UK | Oct 2024 – Present |
| 15 | +## Experience |
16 | 16 |
|
17 | | -Contributing to virtual cell foundation model development at the AI-drug discovery interface: |
18 | | -- **Virtual Cell Initiative**: Fine-tuning multi-modal LLMs on integrated omics data to create predictive biological models |
19 | | -- **[TxPert](https://neurips.cc/virtual/2025/loc/san-diego/poster/116558)**: Co-developed SOTA transcriptomic perturbation prediction model with novel systems biology knowledge graph integration (NeurIPS 2025) |
20 | | -- **Boltz2 Project**: Contributing to proteome-wide virtual drug screening platform |
21 | | -- **Community Leadership**: Virtual Cell Journal Club organizer |
| 17 | +**Senior Machine Learning Scientist — Valence Labs @ Recursion Pharmaceuticals** |
| 18 | +London, UK • Oct 2024 – Present |
| 19 | +- **Virtual Cell initiative:** Fine‑tune multi‑modal LLMs over **knowledge graphs + text + RNA‑seq + phenotypic imaging** to model cell state and predict gene/drug responses; partnered closely with biology to design benchmark tasks and success metrics. |
| 20 | +- **TxPert (NeurIPS 2025):** Co‑developed a **state‑of‑the‑art transcriptomic perturbation predictor** using systems‑biology KGs; contributed training code, data curation, and ablations. |
| 21 | +- **Boltz2 project:** Contributed to proteome‑scale **virtual drug screening** components; supported evaluation strategy and error analysis across targets. |
| 22 | +- **Community:** Organiser, **Virtual Cell Journal Club**; fostered reading group bridging ML and wet‑lab teams. |
22 | 23 |
|
23 | | -### Senior Research Associate & AI Engineering Lead |
24 | | -**EMBL-EBI** | Cambridge, UK | Dec 2022 – Oct 2024 |
| 24 | +**Senior Research Associate & AI Engineering Lead — EMBL‑EBI (Uhlmann Group & Bio‑Image Archive)** |
| 25 | +Cambridge, UK • Dec 2022 – Oct 2024 |
| 26 | +- **Team leadership:** Supervised **6 PhD students**; established coding standards, CI, and peer‑review practices used across the lab. |
| 27 | +- **Spatial biology:** Built deep‑learning pipelines for **high‑content cell morphology** and single‑cell feature learning; integrated with public bioimage resources. |
| 28 | +- **Open‑source:** Created **bioimage_embed** (self‑supervised biological images) and **shape_embed** (cell‑shape DL toolkit); productionised training/inference. |
| 29 | +- **MLOps:** Designed scalable pipelines processing **TB‑scale microscopy datasets** across HPC and cloud; containerised workflows, automated experiment tracking. |
| 30 | +- **Academic service:** Reviewer — ISBI 2022/2023, **ICASSP** 2024. |
25 | 31 |
|
26 | | -Led AI initiatives for the Uhlmann Group and Bio-Image Archive: |
27 | | -- **Leadership**: Supervised 6 PhD students, establishing coding standards and peer review processes |
28 | | -- **Spatial Biology**: Developed deep learning approaches for high-content cell morphology analysis |
29 | | -- **[bioimage_embed](https://github.com/ctr26/bioimage_embed)**: Created self-supervised learning framework for biological images |
30 | | -- **[shape_embed](https://github.com/ctr26/shape_embed)**: Deep learning toolkit for cell shape analysis |
31 | | -- **MLOps**: Built scalable pipelines processing TB-scale microscopy datasets |
32 | | -- **Academic Service**: Reviewer for ISBI 2022/2023, ICASP 2024 |
| 32 | +**AI/ML Founding Engineer — Amun AI AB** |
| 33 | +Stockholm, Sweden • 2022 – 2024 |
| 34 | +- Built **GKE/Kubernetes** platform for model serving with **NVIDIA Triton/KServe**; supported **100+ models** for **30+ daily users** with auth, monitoring and autoscaling. |
33 | 35 |
|
34 | | -### AI/ML Founding Engineer |
35 | | -**Amun AI AB** | Stockholm, Sweden | 2022 – 2024 |
| 36 | +**AI/ML Engineering Consultant — DeepMirror** |
| 37 | +Cambridge & London, UK • 2022 – 2024 |
| 38 | +- **MouseMindMapper:** Automated brain‑histology segmentation product generating **£50k annual revenue**; delivered end‑to‑end data, training, packaging and docs. |
| 39 | +- Wrote a high‑performance **C++ cheminformatics fingerprinting** library for production use. |
36 | 40 |
|
37 | | -- **NVIDIA Partnership**: Scalable Kubernetes-based model serving infrastructure |
38 | | -- **Production Systems**: GKE infrastructure serving 100+ models to 30+ daily users |
| 41 | +**Data Scientist — Brazma Group, EMBL‑EBI** |
| 42 | +Cambridge, UK • Dec 2019 – Dec 2023 |
| 43 | +- Co‑authored the successful **AI4LIFE €5M** grant (federated bioimage AI infrastructure); contributed to platform architecture and model‑sharing strategy. |
| 44 | +- Drove **large‑scale AI microscopy** analyses in the Image Data Resource; collaborated with **Google Cloud** on representation learning. |
| 45 | +- Taught annual deep‑learning courses to **40+ researchers** (PhD to PI). |
39 | 46 |
|
40 | | -### AI/ML Engineering Consultant |
41 | | -**DeepMirror** | Cambridge & London, UK | 2022 – 2024 |
| 47 | +**Software Engineer (COVID‑19 Response) — European Nucleotide Archive, EMBL‑EBI** |
| 48 | +Cambridge, UK • Mar 2020 – Sept 2020 |
| 49 | +- Built CI/CD for the **COVID‑19 Data Portal** to enable **daily global data updates**. |
| 50 | +- Scaled NGS alignment and **Nextflow/Kubernetes** ETL pipelines for surging data volumes. |
42 | 51 |
|
43 | | -- **[MouseMindMapper](http://github.com/deepmirror)**: Automated brain histology segmentation (£50k annual revenue) |
44 | | -- **Cheminformatics**: High-performance C++ molecular fingerprinting library |
45 | | - |
46 | | -### Data Scientist |
47 | | -**Brazma Group, EMBL-EBI** | Cambridge, UK | Dec 2019 – Dec 2023 |
48 | | - |
49 | | -- **Education**: Annual deep learning courses for 40+ participants (PhD to PI level) |
50 | | -- **[AI4LIFE Grant](https://ai4life.eurobioimaging.eu/)**: Co-wrote successful €5M proposal for federated AI infrastructure |
51 | | -- **AI4LIFE Platform**: Contributing to the development of federated bioimage analysis tools and model sharing |
52 | | -- **Image Data Resource**: Large-scale AI-driven microscopy analysis |
53 | | -- **Industry Partnerships**: Google Cloud collaboration for representation learning |
54 | | - |
55 | | -### Software Engineer (COVID-19 Response) |
56 | | -**European Nucleotide Archive, EMBL-EBI** | Cambridge, UK | Mar 2020 – Sept 2020 |
57 | | - |
58 | | -- **[COVID-19 Data Portal](https://www.covid19dataportal.org/)**: CI/CD pipelines enabling daily global data updates |
59 | | -- **NGS Processing**: Scaled sequence alignment pipelines for large data volumes |
60 | | -- **Infrastructure**: Containerized ETL pipelines using Nextflow and Kubernetes |
61 | | - |
62 | | -### Computational Microscopist |
63 | | -**National Physical Laboratory** | London, UK | 2018 – Dec 2019 |
64 | | - |
65 | | -- Novel 3D organoid segmentation algorithms for cancer research |
66 | | -- MSquared consultancy on advanced imaging solutions |
| 52 | +**Computational Microscopist — National Physical Laboratory** |
| 53 | +London, UK • 2018 – Dec 2019 |
| 54 | +- Developed novel **3D organoid segmentation** methods for cancer research; delivered consultancy to MSquared on advanced imaging. |
67 | 55 |
|
68 | 56 | ## Education |
69 | 57 |
|
70 | | -### PhD in Engineering |
71 | | -**University of Cambridge** | 2014 – 2018 | EPSRC PES-CDT Studentship |
72 | | -*Thesis*: "Light-sheet microscopy for tracking particles in large specimens" |
73 | | -*Focus*: Microscope automation and biological image analysis |
74 | | -- Designed and built novel light-sheet microscope with automated acquisition |
75 | | -- Developed algorithms for particle tracking and optimising signal collection |
76 | | -- Created homographic signal generation and micrometer-scale tomography algorithms |
77 | | -- Supervised 3 students (2 MRes, 1 BSc) |
78 | | - |
79 | | -### MRes in Photonics |
80 | | -**University of Cambridge & UCL** | 2013 – 2014 | [EPSRC Photonics CDT](https://www.pes-cdt.org/) |
81 | | -- Structured illumination microscopy reconstruction software |
82 | | -- Modules: Computer Vision, Quantum Mechanics, Photonics |
83 | | - |
84 | | -### MSci in Physics (First Class Honours) |
85 | | -**Nottingham Trent University** | 2009 – 2013 |
86 | | -- Top physics student in graduating class |
87 | | -- Mountaineering Club President (2011-2012) |
88 | | - |
89 | | -## Patents |
90 | | - |
91 | | -<div id="patents-list"> |
92 | | -<!-- Patents will be loaded dynamically --> |
93 | | -</div> |
94 | | - |
95 | | -## Selected Publications |
96 | | - |
97 | | -<div id="publications-list"> |
98 | | -<!-- Publications will be loaded dynamically via API only --> |
99 | | -</div> |
100 | | - |
101 | | -<script src="{{ '/assets/js/publications.js' | relative_url }}"></script> |
102 | | - |
103 | | -<a href="https://scholar.google.com/citations?user=XVt7BYQAAAAJ&hl=en" target="_blank" rel="noopener">View all publications on Google Scholar →</a> |
| 58 | +**PhD, Engineering — University of Cambridge** • 2014 – 2018 (EPSRC PES‑CDT) |
| 59 | +*Thesis:* “Light‑sheet microscopy for tracking particles in large specimens” |
| 60 | +- Designed & built a novel light‑sheet microscope with automated acquisition. |
| 61 | +- Algorithms for particle tracking, signal optimisation, and micrometre‑scale tomography. |
| 62 | +- Supervision: 2× MRes, 1× BSc. |
104 | 63 |
|
105 | | -## Technical Skills |
| 64 | +**MRes, Photonics — University of Cambridge & UCL** • 2013 – 2014 (EPSRC Photonics CDT) |
| 65 | +- Structured‑illumination microscopy reconstruction; modules in Computer Vision, Quantum Mechanics, Photonics. |
106 | 66 |
|
107 | | -### Machine Learning & AI |
108 | | -- **Foundation Models**: Multi-modal LLM fine-tuning, OOD prediction, self-supervised learning |
109 | | -- **Deep Learning**: PyTorch, TensorFlow, Lightning, probabilistic programming (Pyro.ai), Hugging Face |
110 | | -- **Computer Vision**: Bio-image analysis, 3D reconstruction, segmentation, super-resolution |
111 | | -- **Biological Data**: snRNAseq, bulk RNAseq, histopathology, fluorescence imaging, GNNs for systems biology |
| 67 | +**MSci, Physics (First‑Class Honours) — Nottingham Trent University** • 2009 – 2013 |
| 68 | +- Top physics graduate; President, Mountaineering Club (2011–2012). |
112 | 69 |
|
113 | | -### Programming & Infrastructure |
114 | | -- **Languages**: Python (8+ years), R, MATLAB, C++, Java |
115 | | -- **Scientific Computing**: NumPy, SciPy, Pandas, scikit-learn, OpenCV, scikit-image |
116 | | -- **GPU Computing**: Multi-GPU training on NVIDIA A100/V100 clusters, CUDA optimization, distributed training across HPC environments, SLURM job scheduling, GCP/AWS GPU instances |
117 | | -- **Cloud & DevOps**: GCP, AWS, Kubernetes, Docker, CI/CD, Terraform |
118 | | -- **Workflow Management**: Snakemake, Nextflow, Apache Airflow |
119 | | -- **Model Deployment**: NVIDIA Triton, KServe, MLflow |
| 70 | +## Selected Publications & Preprints |
| 71 | +- See **Google Scholar** for the full list. Include **3–5** most relevant to drug discovery/foundation models directly here when submitting to roles. |
| 72 | + - Example placeholders: |
| 73 | + - *TxPert: Transcriptomic Perturbation Prediction with Systems‑Biology KGs* (NeurIPS 2025). |
| 74 | + - *Self‑supervised representation learning for biological images* (bioimage_embed). |
120 | 75 |
|
121 | | -## Open Source Projects |
122 | | - |
123 | | -### Major Contributions |
| 76 | +## Patents |
| 77 | +- If applicable, list 1–3 relevant filings with **title • number • year • brief contribution**. Remove this section if none. |
124 | 78 |
|
125 | | -- **[bioimage_embed](https://github.com/ctr26/bioimage_embed)**: Self-supervised learning framework for biological image analysis |
126 | | -- **[shape_embed](https://github.com/ctr26/shape_embed)**: Deep learning toolkit for cell shape analysis |
127 | | -- **[pydeconv](https://github.com/ctr26/pydeconv)**: GPU-accelerated deconvolution library for microscopy |
128 | | -- **[nuclear_phenotyping](https://github.com/ctr26/nuclear_phenotyping)**: Automated nuclear morphology analysis pipeline |
| 79 | +## Open Source (Selected) |
| 80 | +- **bioimage_embed** — Self‑supervised learning for biological images. |
| 81 | +- **shape_embed** — Deep‑learning toolkit for cell‑shape analysis. |
| 82 | +- **pydeconv** — GPU‑accelerated deconvolution for microscopy. |
| 83 | +- **nuclear_phenotyping** — Automated nuclear morphology analysis. |
| 84 | +- Contributions to **Hypha Platform**, **BioImage Model Zoo**, **BIA Binder**, **Hypha Helm Charts**, **COVID Workflow Manager**. |
129 | 85 |
|
130 | | -### Infrastructure Projects |
| 86 | +## Skills |
131 | 87 |
|
132 | | -- **[Hypha Platform](https://hypha.imjoy.io/)**: Distributed computing platform for bioimage analysis |
133 | | -- **[BioImage Model Zoo](https://bioimage.io)**: Community-driven repository for deep learning models |
134 | | -- **[BIA Binder](https://binder.bioimagearchive.org/)**: Interactive notebook platform for bioimage analysis |
135 | | -- **[Hypha Helm Charts](https://github.com/amun-ai/hypha-helm-charts)**: Kubernetes deployment templates |
136 | | -- **[COVID Workflow Manager](https://github.com/enasequence/covid-workflow-manager)**: ETL pipeline orchestration |
| 88 | +**ML & AI:** Foundation‑model fine‑tuning, contrastive/self‑supervised learning, OOD & uncertainty, evaluation/ablation design |
| 89 | +**Frameworks:** PyTorch, TensorFlow, Lightning, Pyro, Hugging Face, scikit‑learn |
| 90 | +**Vision & Bio:** Bioimage analysis, 3D reconstruction, segmentation/super‑resolution, **snRNA‑seq/bulk RNA‑seq**, histopathology, fluorescence imaging, **GNNs**, knowledge graphs |
| 91 | +**Languages:** Python (primary), R, MATLAB, C++, Java |
| 92 | +**Compute:** Multi‑GPU training (A100/V100), CUDA, distributed training, SLURM, HPC, GCP/AWS GPU instances |
| 93 | +**MLOps/Infra:** Kubernetes, Docker, **NVIDIA Triton**, **KServe**, MLflow, CI/CD, Terraform |
| 94 | +**Workflows:** Nextflow, Snakemake, Apache Airflow |
137 | 95 |
|
138 | 96 | ## Grants & Awards |
139 | | - |
140 | | -- **AI4LIFE** (2022): Co-investigator on €5M EU grant for federated bioimage AI infrastructure |
141 | | -- **EPSRC CDT Studentship** (2013-2018): [Photonic and Electronic Systems CDT](https://www.pes-cdt.org/) full funding (£120k) |
142 | | -- **Nuffield Research Bursary** (2012): Computer vision for liquid crystal flow analysis |
143 | | -- **Institute of Physics Grant** (2009-2012): Household means support |
144 | | - |
145 | | -## Teaching & Leadership |
146 | | - |
147 | | -### Course Development & Delivery |
148 | | -- **Deep Learning for Bioimage Analysis** (2019-2023): Annual course at EMBL-EBI for 40+ participants |
149 | | -- **Mathematics Tutorials** (2015-2018): First-year Natural Sciences at Magdalene College, Cambridge |
150 | | -- **C Programming Workshops** (2015): Clare College Summer School |
151 | | -- **Quantum Mechanics & Scientific Ethics** (2014-2016): Reach Summer School, Cambridge |
152 | | - |
153 | | -### Research Supervision |
154 | | -- 6 PhD students in AI and spatial biology (2022-2024) |
155 | | -- 3 project students during PhD (2014-2018) |
156 | | - |
157 | | -### Community Leadership |
158 | | -- Lower Boats Captain & Coach, Magdalene College (2014-2018) |
159 | | -- Mountaineering Club President, NTU (2011-2012) |
160 | | - |
161 | | ---- |
162 | | - |
163 | | -## Professional Service |
164 | | - |
165 | | -- **Peer Review**: Nature Methods, Scientific Reports, Journal of Microscopy, ISBI, ICASP |
166 | | -- **Conference Presentations**: FOM (2018, 2022, 2023), MMC (2018, 2022), CBIAS (2023) |
167 | | -- **Open Source**: Maintainer of multiple scientific software packages |
168 | | -- **Professional Membership**: Institute of Physics |
169 | | - |
170 | | -*References available upon request* |
| 97 | +- **AI4LIFE** (2022) — Co‑investigator on **€5M** EU grant (federated bioimage AI) |
| 98 | +- **EPSRC CDT Studentship** (2013–2018) — Photonic & Electronic Systems CDT (£120k) |
| 99 | +- **Nuffield Research Bursary** (2012) — Computer vision for liquid‑crystal flows |
| 100 | +- **Institute of Physics** grant support (2009–2012) |
| 101 | + |
| 102 | +## Teaching, Mentoring & Service |
| 103 | +- **Course lead:** Deep Learning for Bioimage Analysis (2019–2023), 40+ participants/year |
| 104 | +- **Supervision:** 6 PhD students (AI & spatial biology) + 3 project students (PhD years) |
| 105 | +- **Peer review:** Nature Methods, Scientific Reports, Journal of Microscopy, ISBI, **ICASSP** |
| 106 | +- **Talks & conferences:** FOM (2018, 2022, 2023), MMC (2018, 2022), CBIAS (2023) |
| 107 | +- **Community leadership:** Rowing captain/coach (Magdalene College), Mountaineering Club President (NTU) |
| 108 | + |
| 109 | +*References available upon request.* |
0 commit comments