I’m an engineer working at the intersection of research and real-world systems. I build and ship multimodal AI systems, focusing on vision-language models, speech, and efficient on-device inference.
At Hugging Face, I lead multimodal research and contribute to:
- Vision-Language Models (VLMs)
- Speech-to-speech and conversational systems
- Multimodal research with an emphasis on efficiency and real-world deployment
- Robotics-facing AI systems
I enjoy building things that are both technically solid and actually usable, from research code to demos and production-ready tools.
- Research prototypes and experimental ideas
- Open-source tools and demos
- Work around multimodal models, audio, and vision
- Occasional side projects
- PhD in applied machine learning (speech and generative models)
- Former senior ML engineer at Unity
- Interested in small, fast, and well-engineered models
Feel free to explore, fork, or reach out.





