22episode : 5
33guests :
44- valeriybabushkin
5- intro : " In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com and Kaggle Grandmaster—breaks down how to approach machine learning system design at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex, explaining how to move beyond algorithms to focus on end-to-end design, feature engineering, and evaluation. Valerii walks through a real-world fraud detection example, discusses how to structure interview answers, and outlines the core principles from his book Machine Learning System Design. You’ll learn how to think like a senior ML engineer and design robust, production-ready systems."
6- description : " Master ML system design interviews with Valerii Babushkin, ex-Meta Head of Data Science. Learn fraud detection systems, feature engineering, metrics selection, and production ML best practices for FAANG interviews."
5+ intro : In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com
6+ and Kaggle Grandmaster—breaks down how to approach machine learning system design
7+ at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex,
8+ explaining how to move beyond algorithms to focus on end-to-end design, feature
9+ engineering, and evaluation. Valerii walks through a real-world fraud detection
10+ example, discusses how to structure interview answers, and outlines the core principles
11+ from his book Machine Learning System Design. You’ll learn how to think like a senior
12+ ML engineer and design robust, production-ready systems.
13+ description : Master ML system design interviews with Valerii Babushkin, ex-Meta Head
14+ of Data Science. Learn fraud detection systems, feature engineering, metrics selection,
15+ and production ML best practices for FAANG interviews.
716topics :
817- machine learning
918- career growth
@@ -18,8 +27,10 @@ links:
1827 youtube : https://www.youtube.com/watch?v=0RsmRjar66E
1928season : 7
2029short : Machine Learning System Design Interview
21- title : " Machine Learning System Design & Interview Strategies for Senior ML Engineers"
30+ title : Machine Learning System Design & Interview Strategies for Senior ML Engineers
2231transcript :
32+ - header : Podcast Introduction & Episode Overview
33+ - header : ' Valerii Background: Career Snapshot and Kaggle Achievements'
2334- line : This week, we'll talk about machine learning system design interviews. We
2435 have a special guest today, Valerii. Valerii works at Blockchain.com as a head
2536 of data science. Before that, he worked in quite a few places. More recently at
@@ -47,15 +58,14 @@ transcript:
4758 sec : 182
4859 time : ' 3:02'
4960 who : Valerii
50- - header : " Guest Introduction: Head of Data Science & Kaggle Grandmaster"
5161- line : ' [laughs] Okay, so I briefly already told everyone about your background.
5262 But before we go into our main topic of machine learning system design, maybe
5363 let'' s talk a bit more about your career journey in detail. Can you tell us a
5464 bit about that?'
5565 sec : 186
5666 time : ' 3:06'
5767 who : Alexey
58- - header : " Blockchain.com History & Broad Head of Data Science Role "
68+ - header : ' Blockchain.com Role: Scope, Responsibilities, and Data Ownership '
5969- line : Well, sure. Let's start from the current time. As you said, I'm head of data
6070 science at Blockchain. So a bit about blockchain, first. It's a very old crypto
6171 company. When I say very old – it is very, very old. It was founded in 2011. Try
@@ -94,7 +104,7 @@ transcript:
94104 sec : 342
95105 time : ' 5:42'
96106 who : Alexey
97- - header : " Career Shift: Retail to Facebook Privacy & Large-Scale Systems "
107+ - header : ' Transition to Meta: User Privacy Work and Large-Scale ML Experience '
98108- line : To some extent, yes, because it's everything related to data – from infrastructure
99109 to applications. From analytics to visualization. Before that, I was working in
100110 – well, I joined Facebook and left Meta. I will just rotate my screen a bit –
@@ -123,6 +133,7 @@ transcript:
123133 sec : 450
124134 time : ' 7:30'
125135 who : Alexey
136+ - header : ' Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
126137- line : ' Live interview? Okay. I don'' t think it'' s about Blockchain’s mission. That'' s
127138 it. What else? I was leading quite a big team in my time – the biggest team I
128139 was leading was almost 150 people: machine learning engineers, data analysts,
@@ -163,7 +174,7 @@ transcript:
163174 sec : 547
164175 time : ' 9:07'
165176 who : Valerii
166- - header : " ML System Design: Target Audience (Level 5 Senior MEs) "
177+ - header : ' Candidate Targeting: Who Faces ML System Design Interviews '
167178- line : Okay. Let's talk about machine learning system design. This is a part of the
168179 interview process and you said you did a lot of interviews as the interviewer.
169180 I imagine also, when you were joining Facebook before that, you also had to take
@@ -205,6 +216,7 @@ transcript:
205216 sec : 680
206217 time : ' 11:20'
207218 who : Alexey
219+ - header : ' Interview Structure: 45-Minute Narrative and Evaluation Goals'
208220- line : Yeah, true. Good catch. Yes, level five is a Senior in terms of the level
209221 on Facebook, which means that, if you're on this level, it is an honorary thing
210222 to be on this level forever. So if you ended on level four, it was probably because
@@ -241,7 +253,6 @@ transcript:
241253 sec : 798
242254 time : ' 13:18'
243255 who : Valerii
244- - header : " System Design vs. ML Design & Focusing on Machine Learning"
245256- line : ' I think this is what happened to me, but this is something that I prepared
246257 for later. So, you said that important interviews for detecting, or assessing
247258 your level are: behavioral interview, system design interview, and machine learning
@@ -250,6 +261,9 @@ transcript:
250261 sec : 816
251262 time : ' 13:36'
252263 who : Alexey
264+ - header : ' Contrast: Software System Design Versus ML System Design'
265+ - header : ' Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
266+ Needs'
253267- line : Okay, let's try to determine the disparity between those two. First of all,
254268 when you're asked to do a system design interview, you're usually asked about
255269 data structures, about different server-side components, like “What are the databases?
@@ -271,7 +285,6 @@ transcript:
271285 sec : 838
272286 time : ' 13:58'
273287 who : Valerii
274- - header : " Fraud Detection Walkthrough: Loss Functions, Metrics, Modeling"
275288- line : ' Now we can say that we know that we have to put not zero or one, but some
276289 score between zero and one, when we have a transaction. When we have a transaction
277290 now, that probably means we'' d like to have the system in real time. Okay, let'' s
@@ -295,6 +308,7 @@ transcript:
295308 sec : 838
296309 time : ' 13:58'
297310 who : Valerii
311+ - header : Labeling, Class Imbalance, and Feature Engineering Tradeoffs
298312- line : Fortunately, the very basic log loss is good here. So we know that we might
299313 start from log loss. We also know that we might start from a very basic linear
300314 regression model. Why is that? Because we know that it has to be very fast – in
@@ -360,6 +374,7 @@ transcript:
360374 sec : 1003
361375 time : ' 16:43'
362376 who : Valerii
377+ - header : ' Interview Tactics: Stating Assumptions and Getting Alignment'
363378- line : That's quite a lot of information. I was trying to process this. That's quite
364379 a lot of things. So this was an example of machine learning system design. The
365380 interview starts and then the person – the interviewer – asks you, "Let's design
@@ -368,7 +383,6 @@ transcript:
368383 sec : 1233
369384 time : ' 20:33'
370385 who : Alexey
371- - header : " Interview Strategy: Making Assumptions & System vs. ML Design Examples"
372386- line : The best way is not even to ask, but to say "My assumption is that. Do you
373387 agree with that or not?” You see, you asked the question, but actually, you’ve
374388 made an assumption. You say “Are you okay with that?” Because you've been given
@@ -381,6 +395,7 @@ transcript:
381395 sec : 1270
382396 time : ' 21:10'
383397 who : Valerii
398+ - header : ' Example: Points-of-Interest System vs Personalized Recommender'
384399- line : Yeah, indeed. So, the original question I actually asked you is about the
385400 difference between system design and machine learning system design and I think
386401 it's very clear what machine learning system design is. It requires some domain
@@ -443,7 +458,7 @@ transcript:
443458 sec : 1467
444459 time : ' 24:27'
445460 who : Valerii
446- - header : " ML System is the Whole Pipeline & Interview Failure: Too Much Heuristics "
461+ - header : ' End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing '
447462- line : But where does system design actually come into the picture here? Because
448463 here, we talked about selecting the right metric, which was the important thing,
449464 as you said. You said it was log loss for this specific case. Or even before log
@@ -531,7 +546,6 @@ transcript:
531546 sec : 1690
532547 time : ' 28:10'
533548 who : Valerii
534- - header : " Securing the Interview: Iterative Baseline Design & Technical Depth"
535549- line : ' [laughs] I might be wrong with using these words. I think the recruiter probably
536550 used different words. But the reason for me failing the process – the whole interview
537551 – was machine learning system design. Not the others. I was afraid about the others.
@@ -543,6 +557,7 @@ transcript:
543557 sec : 1708
544558 time : ' 28:28'
545559 who : Alexey
560+ - header : ' Securing the Interview: Iterative Baselines and Signposting Depth'
546561- line : Let's be honest, the interviewer was a human, and humans are subjective. Maybe
547562 they had a bad day. However, to some extent, I'm surprised because it's hard to
548563 say the interview was nodding. Maybe, again, the way you remember it and the way
@@ -586,6 +601,7 @@ transcript:
586601 sec : 1869
587602 time : ' 31:09'
588603 who : Alexey
604+ - header : ' Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
589605- line : Well, it's an interesting question for which there is no single answer. It
590606 depends. My opinion is that the interview has to be as close to the real job –
591607 the real work – as it can be. So, to be honest, in applied machine learning, you
@@ -623,7 +639,7 @@ transcript:
623639 sec : 1999
624640 time : ' 33:19'
625641 who : Valerii
626- - header : " ML System Prep: Experience, Mock Interviews, Dealing with Unknown Domains "
642+ - header : ' Preparation Strategies: Mock Interviews, Resources, and Experience '
627643- line : Okay. [laughs] So, how do I actually prepare for machine learning system design
628644 interviews? It feels as though just being a practitioner is not enough. Because,
629645 first, you never know what exactly is expected. I guess you need to ask that.
@@ -709,7 +725,7 @@ transcript:
709725 sec : 2248
710726 time : ' 37:28'
711727 who : Valerii
712- - header : " Tool: ML Project Checklist & Defining Goal, Proxy Metrics, Long-Term Health "
728+ - header : ' Industry Checklist: Core ML Project Review Items and Patterns '
713729- line : Speaking of this mock interview – a while ago, I had a mock interview with
714730 Valerii, where Valerii interviewed me. The question was about designing a fraud
715731 detection system.
@@ -755,7 +771,7 @@ transcript:
755771 sec : 2353
756772 time : ' 39:13'
757773 who : Valerii
758- - header : " Post-Goal Steps: Features, Validation, A/B Testing, Monitoring, Fallbacks "
774+ - header : ' Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health '
759775- line : So about this checklist – let's say we need to design a system, not necessarily
760776 for an interview, but just design a system. What is the first thing we need to
761777 do? Do you remember what is in this checklist?
@@ -839,6 +855,7 @@ transcript:
839855 sec : 2641
840856 time : ' 44:01'
841857 who : Alexey
858+ - header : Features, Labels, Model Selection, and Validation Workflow
842859- line : Let's say we know what we would like to do. We know how we can try to optimize
843860 it in this way. What does that mean? That means that if my model improves, there
844861 is a high chance that my metric of interest will be better. Now, I need to think
@@ -868,6 +885,7 @@ transcript:
868885 sec : 2651
869886 time : ' 44:11'
870887 who : Valerii
888+ - header : ' Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
871889- line : Perhaps if you cover all these parts during your system design interview,
872890 you're already in quite a good position. Right?
873891 sec : 2762
@@ -914,7 +932,7 @@ transcript:
914932 sec : 2868
915933 time : ' 47:48'
916934 who : Valerii
917- - header : " ML System Components: Algorithms are 1-5% & Features are Paramount "
935+ - header : ' System Components: Why Features Matter More Than Model Architecture '
918936- line : Okay. So let's go to the questions. We have quite a few of them. The first
919937 question we have is, “What are the typical components of a machine learning system?
920938 And what percentage of it are machine learning algorithms?”
@@ -968,6 +986,7 @@ transcript:
968986 sec : 2997
969987 time : ' 49:57'
970988 who : Valerii
989+ - header : ' Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
971990- line : Thank you. Let's go to the next one, “How to make machine learning algorithms
972991 work with other parts of systems to solve real world problems?” I guess the question
973992 is more about, “Okay, we have this model that we just discussed. This model for
@@ -1000,7 +1019,7 @@ transcript:
10001019 sec : 3134
10011020 time : ' 52:14'
10021021 who : Alexey
1003- - header : " Concept: Avoiding ML & Tool: Machine Learning Design Patterns Book "
1022+ - header : When to Avoid ML and Useful Design Pattern References
10041023- line : Do we really need machine learning here exactly? Maybe we can be lucky and
10051024 we can just avoid it.
10061025 sec : 3145
@@ -1052,7 +1071,7 @@ transcript:
10521071 sec : 3239
10531072 time : ' 53:59'
10541073 who : Valerii
1055- - header : " New Grad Interviews: No System Design & Focus on Coding (LeetCode) "
1074+ - header : ' New Grad Expectations: Coding Focus and Limited System Design '
10561075- line : Yeah, so another question from Alvaro. Alvaro is graduating soon and he is
10571076 a machine learning intern at a startup. He's starting a job hunt, hopefully [inaudible].
10581077 So how much system design should he expect as a new grad?
@@ -1130,7 +1149,7 @@ transcript:
11301149 sec : 3440
11311150 time : ' 57:20'
11321151 who : Valerii
1133- - header : " Validation in Production: A/B Tests, Human Labels, Practitioner Experience "
1152+ - header : ' Validating in Production: A/B Tests, Causality, and Human Labels '
11341153- line : Okay. I don't think we have a lot of time for more questions. There is an
11351154 interesting question from Vijay, which is about, “What is the best way to validate
11361155 the model performance in production? Do we need humans for that or are there other
@@ -1177,6 +1196,7 @@ transcript:
11771196 sec : 3527
11781197 time : ' 58:47'
11791198 who : Valerii
1199+ - header : ' Career Path: Moving from Data Science Practice to System Design'
11801200- line : Yeah, so the question is, “With this profile, you're very good at doing data
11811201 science stuff. How did you transition from data science to being good at system
11821202 design?”
@@ -1203,6 +1223,7 @@ transcript:
12031223 sec : 3583
12041224 time : ' 59:43'
12051225 who : Valerii
1226+ - header : Closing Remarks and Contact Information
12061227- line : ' [laughs] Okay, I think that'' s all we have time for. So maybe last one –
12071228 How can people find you?'
12081229 sec : 3603
@@ -1260,7 +1281,6 @@ transcript:
12601281 time : ' 1:00:51'
12611282 who : Valerii
12621283---
1263-
12641284Links:
12651285
12661286* [ Valerii's telegram channel (in Russian)] ( https://t.me/cryptovalerii ) {: target ="_ blank"}
0 commit comments