Senior Data Scientist/Engineer building production reproducible analytical piplines (RAP), ML/GenAI systems at the Food Standards Agency. Previously led open-source data engineering at the Office for National Statistics.
- LangChain Agent - Intelligent data standardization for 360+ local authority data sources with extreme format variance
- NLP Classification - DistilBERT transformer model (82% accuracy, 240-class classification) on Azure
- Platform Engineering - Migrating enterprise data to Databricks Medallion architecture (Azure/Databricks)
- ML Production Systems - Full lifecycle deployment, monitoring, MLOps best practices
ML/GenAI: LangChain • Transformers (DistilBERT, FastText) • Scikit-learn • PyTorch • MLFlow
Data Engineering: Databricks • PySpark • Apache Spark • Python • SQL
Cloud & DevOps: Azure • GCP • GitHub Actions • CI/CD
Databases: PostgreSQL • BigQuery • DuckDB
Unfortunately the vast majority of my work is closed-source but I pioneered the use of an open-from-the-start that adheres to government guidelines.
Production system calculating UK national accounts R&D expenditure statistics. Pioneered open-source approach within ONS, establishing pattern for government data science transparency.
Scale & Impact:
- 4,680+ commits across 20 contributors
- 94 production releases serving national statistics
- Comprehensive CI/CD pipeline, testing (61% coverage), with excellent technical and non-technical user documentation
- Set precedent for open-source government data projects at ONS
Role: Technical lead and open-source advocate - negotiated stakeholder approval for public release from project inception.
Sustainability • Food Systems • Production ML systems • Data engineering best practices • Practical AI applications • Open government software




