22episode : 5
33guests :
44- valeriybabushkin
5- intro : In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com
6- and Kaggle Grandmaster—breaks down how to approach machine learning system design
7- at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex,
8- explaining how to move beyond algorithms to focus on end-to-end design, feature
9- engineering, and evaluation. Valerii walks through a real-world fraud detection
10- example, discusses how to structure interview answers, and outlines the core principles
11- from his book Machine Learning System Design. You’ll learn how to think like a senior
12- ML engineer and design robust, production-ready systems.
13- description : Master ML system design interviews with Valerii Babushkin, ex-Meta Head
14- of Data Science. Learn fraud detection systems, feature engineering, metrics selection,
15- and production ML best practices for FAANG interviews.
5+ intro : ' How do you approach ML system design interviews that probe production constraints,
6+ fraud detection trade-offs, and MLOps realities? In this episode, Valerii Babushkin
7+ — Senior Director of Data, Analytics, and AI at BP, Kaggle Competitions Grandmaster,
8+ and author of Machine Learning System Design — walks through what interviewers look
9+ for and how candidates should structure answers for real-world ML problems. <br><br>
10+ We cover concrete topics you can use in interviews and on the job: distinguishing
11+ software vs. ML system design; a fraud detection case study (probabilities, loss
12+ functions, real-time requirements); label noise, class imbalance, and feature engineering
13+ trade-offs; end-to-end pipeline items like metrics, baselines, A/B testing, and
14+ validating in production; monitoring, distribution shift, fallbacks, and production
15+ robustness; serving models, embeddings, and MLOps roles; plus when to avoid ML and
16+ practical checklist items for core projects. Valerii also shares interview tactics
17+ — signposting depth, stating assumptions, iterative baselines — and guidance for
18+ new grads and career progression toward system design roles. <br><br> Listen to
19+ learn actionable frameworks, example trade-offs, and preparation strategies to improve
20+ your ML system design interviews and production ML decisions.'
21+ description : ' Master ML system design: fraud detection, feature engineering & A/B
22+ testing to ace interviews, build robust production models, monitoring and MLOps.'
23+ date : 2025-11-07
1624topics :
1725- machine learning
1826- career growth
@@ -27,10 +35,13 @@ links:
2735 youtube : https://www.youtube.com/watch?v=0RsmRjar66E
2836season : 7
2937short : Machine Learning System Design Interview
30- title : Machine Learning System Design & Interview Strategies for Senior ML Engineers
38+ title : ' ML System Design Interviews: Production ML, Fraud Detection, Features, A/B
39+ Testing & MLOps'
3140transcript :
3241- header : Podcast Introduction & Episode Overview
3342- header : ' Valerii Background: Career Snapshot and Kaggle Achievements'
43+ - header : Podcast Introduction & Episode Overview
44+ - header : ' Valerii Background: Career Snapshot and Kaggle Achievements'
3445- line : This week, we'll talk about machine learning system design interviews. We
3546 have a special guest today, Valerii. Valerii works at Blockchain.com as a head
3647 of data science. Before that, he worked in quite a few places. More recently at
@@ -66,6 +77,7 @@ transcript:
6677 time : ' 3:06'
6778 who : Alexey
6879- header : ' Blockchain.com Role: Scope, Responsibilities, and Data Ownership'
80+ - header : ' Blockchain.com Role: Scope, Responsibilities, and Data Ownership'
6981- line : Well, sure. Let's start from the current time. As you said, I'm head of data
7082 science at Blockchain. So a bit about blockchain, first. It's a very old crypto
7183 company. When I say very old – it is very, very old. It was founded in 2011. Try
@@ -105,6 +117,7 @@ transcript:
105117 time : ' 5:42'
106118 who : Alexey
107119- header : ' Transition to Meta: User Privacy Work and Large-Scale ML Experience'
120+ - header : ' Transition to Meta: User Privacy Work and Large-Scale ML Experience'
108121- line : To some extent, yes, because it's everything related to data – from infrastructure
109122 to applications. From analytics to visualization. Before that, I was working in
110123 – well, I joined Facebook and left Meta. I will just rotate my screen a bit –
@@ -134,6 +147,7 @@ transcript:
134147 time : ' 7:30'
135148 who : Alexey
136149- header : ' Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
150+ - header : ' Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
137151- line : ' Live interview? Okay. I don'' t think it'' s about Blockchain’s mission. That'' s
138152 it. What else? I was leading quite a big team in my time – the biggest team I
139153 was leading was almost 150 people: machine learning engineers, data analysts,
@@ -175,6 +189,7 @@ transcript:
175189 time : ' 9:07'
176190 who : Valerii
177191- header : ' Candidate Targeting: Who Faces ML System Design Interviews'
192+ - header : ' Candidate Targeting: Who Faces ML System Design Interviews'
178193- line : Okay. Let's talk about machine learning system design. This is a part of the
179194 interview process and you said you did a lot of interviews as the interviewer.
180195 I imagine also, when you were joining Facebook before that, you also had to take
@@ -217,6 +232,7 @@ transcript:
217232 time : ' 11:20'
218233 who : Alexey
219234- header : ' Interview Structure: 45-Minute Narrative and Evaluation Goals'
235+ - header : ' Interview Structure: 45-Minute Narrative and Evaluation Goals'
220236- line : Yeah, true. Good catch. Yes, level five is a Senior in terms of the level
221237 on Facebook, which means that, if you're on this level, it is an honorary thing
222238 to be on this level forever. So if you ended on level four, it was probably because
@@ -262,6 +278,9 @@ transcript:
262278 time : ' 13:36'
263279 who : Alexey
264280- header : ' Contrast: Software System Design Versus ML System Design'
281+ - header : ' Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
282+ Needs'
283+ - header : ' Contrast: Software System Design Versus ML System Design'
265284- header : ' Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
266285 Needs'
267286- line : Okay, let's try to determine the disparity between those two. First of all,
@@ -309,6 +328,7 @@ transcript:
309328 time : ' 13:58'
310329 who : Valerii
311330- header : Labeling, Class Imbalance, and Feature Engineering Tradeoffs
331+ - header : Labeling, Class Imbalance, and Feature Engineering Tradeoffs
312332- line : Fortunately, the very basic log loss is good here. So we know that we might
313333 start from log loss. We also know that we might start from a very basic linear
314334 regression model. Why is that? Because we know that it has to be very fast – in
@@ -375,6 +395,7 @@ transcript:
375395 time : ' 16:43'
376396 who : Valerii
377397- header : ' Interview Tactics: Stating Assumptions and Getting Alignment'
398+ - header : ' Interview Tactics: Stating Assumptions and Getting Alignment'
378399- line : That's quite a lot of information. I was trying to process this. That's quite
379400 a lot of things. So this was an example of machine learning system design. The
380401 interview starts and then the person – the interviewer – asks you, "Let's design
@@ -396,6 +417,7 @@ transcript:
396417 time : ' 21:10'
397418 who : Valerii
398419- header : ' Example: Points-of-Interest System vs Personalized Recommender'
420+ - header : ' Example: Points-of-Interest System vs Personalized Recommender'
399421- line : Yeah, indeed. So, the original question I actually asked you is about the
400422 difference between system design and machine learning system design and I think
401423 it's very clear what machine learning system design is. It requires some domain
@@ -459,6 +481,7 @@ transcript:
459481 time : ' 24:27'
460482 who : Valerii
461483- header : ' End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing'
484+ - header : ' End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing'
462485- line : But where does system design actually come into the picture here? Because
463486 here, we talked about selecting the right metric, which was the important thing,
464487 as you said. You said it was log loss for this specific case. Or even before log
@@ -558,6 +581,7 @@ transcript:
558581 time : ' 28:28'
559582 who : Alexey
560583- header : ' Securing the Interview: Iterative Baselines and Signposting Depth'
584+ - header : ' Securing the Interview: Iterative Baselines and Signposting Depth'
561585- line : Let's be honest, the interviewer was a human, and humans are subjective. Maybe
562586 they had a bad day. However, to some extent, I'm surprised because it's hard to
563587 say the interview was nodding. Maybe, again, the way you remember it and the way
@@ -602,6 +626,7 @@ transcript:
602626 time : ' 31:09'
603627 who : Alexey
604628- header : ' Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
629+ - header : ' Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
605630- line : Well, it's an interesting question for which there is no single answer. It
606631 depends. My opinion is that the interview has to be as close to the real job –
607632 the real work – as it can be. So, to be honest, in applied machine learning, you
@@ -640,6 +665,7 @@ transcript:
640665 time : ' 33:19'
641666 who : Valerii
642667- header : ' Preparation Strategies: Mock Interviews, Resources, and Experience'
668+ - header : ' Preparation Strategies: Mock Interviews, Resources, and Experience'
643669- line : Okay. [laughs] So, how do I actually prepare for machine learning system design
644670 interviews? It feels as though just being a practitioner is not enough. Because,
645671 first, you never know what exactly is expected. I guess you need to ask that.
@@ -726,6 +752,7 @@ transcript:
726752 time : ' 37:28'
727753 who : Valerii
728754- header : ' Industry Checklist: Core ML Project Review Items and Patterns'
755+ - header : ' Industry Checklist: Core ML Project Review Items and Patterns'
729756- line : Speaking of this mock interview – a while ago, I had a mock interview with
730757 Valerii, where Valerii interviewed me. The question was about designing a fraud
731758 detection system.
@@ -772,6 +799,7 @@ transcript:
772799 time : ' 39:13'
773800 who : Valerii
774801- header : ' Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health'
802+ - header : ' Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health'
775803- line : So about this checklist – let's say we need to design a system, not necessarily
776804 for an interview, but just design a system. What is the first thing we need to
777805 do? Do you remember what is in this checklist?
@@ -856,6 +884,7 @@ transcript:
856884 time : ' 44:01'
857885 who : Alexey
858886- header : Features, Labels, Model Selection, and Validation Workflow
887+ - header : Features, Labels, Model Selection, and Validation Workflow
859888- line : Let's say we know what we would like to do. We know how we can try to optimize
860889 it in this way. What does that mean? That means that if my model improves, there
861890 is a high chance that my metric of interest will be better. Now, I need to think
@@ -886,6 +915,7 @@ transcript:
886915 time : ' 44:11'
887916 who : Valerii
888917- header : ' Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
918+ - header : ' Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
889919- line : Perhaps if you cover all these parts during your system design interview,
890920 you're already in quite a good position. Right?
891921 sec : 2762
@@ -933,6 +963,7 @@ transcript:
933963 time : ' 47:48'
934964 who : Valerii
935965- header : ' System Components: Why Features Matter More Than Model Architecture'
966+ - header : ' System Components: Why Features Matter More Than Model Architecture'
936967- line : Okay. So let's go to the questions. We have quite a few of them. The first
937968 question we have is, “What are the typical components of a machine learning system?
938969 And what percentage of it are machine learning algorithms?”
@@ -987,6 +1018,7 @@ transcript:
9871018 time : ' 49:57'
9881019 who : Valerii
9891020- header : ' Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
1021+ - header : ' Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
9901022- line : Thank you. Let's go to the next one, “How to make machine learning algorithms
9911023 work with other parts of systems to solve real world problems?” I guess the question
9921024 is more about, “Okay, we have this model that we just discussed. This model for
@@ -1020,6 +1052,7 @@ transcript:
10201052 time : ' 52:14'
10211053 who : Alexey
10221054- header : When to Avoid ML and Useful Design Pattern References
1055+ - header : When to Avoid ML and Useful Design Pattern References
10231056- line : Do we really need machine learning here exactly? Maybe we can be lucky and
10241057 we can just avoid it.
10251058 sec : 3145
@@ -1072,6 +1105,7 @@ transcript:
10721105 time : ' 53:59'
10731106 who : Valerii
10741107- header : ' New Grad Expectations: Coding Focus and Limited System Design'
1108+ - header : ' New Grad Expectations: Coding Focus and Limited System Design'
10751109- line : Yeah, so another question from Alvaro. Alvaro is graduating soon and he is
10761110 a machine learning intern at a startup. He's starting a job hunt, hopefully [inaudible].
10771111 So how much system design should he expect as a new grad?
@@ -1150,6 +1184,7 @@ transcript:
11501184 time : ' 57:20'
11511185 who : Valerii
11521186- header : ' Validating in Production: A/B Tests, Causality, and Human Labels'
1187+ - header : ' Validating in Production: A/B Tests, Causality, and Human Labels'
11531188- line : Okay. I don't think we have a lot of time for more questions. There is an
11541189 interesting question from Vijay, which is about, “What is the best way to validate
11551190 the model performance in production? Do we need humans for that or are there other
@@ -1197,6 +1232,7 @@ transcript:
11971232 time : ' 58:47'
11981233 who : Valerii
11991234- header : ' Career Path: Moving from Data Science Practice to System Design'
1235+ - header : ' Career Path: Moving from Data Science Practice to System Design'
12001236- line : Yeah, so the question is, “With this profile, you're very good at doing data
12011237 science stuff. How did you transition from data science to being good at system
12021238 design?”
@@ -1224,6 +1260,7 @@ transcript:
12241260 time : ' 59:43'
12251261 who : Valerii
12261262- header : Closing Remarks and Contact Information
1263+ - header : Closing Remarks and Contact Information
12271264- line : ' [laughs] Okay, I think that'' s all we have time for. So maybe last one –
12281265 How can people find you?'
12291266 sec : 3603
0 commit comments