Skip to content

Commit 81b1a7a

Browse files
authored
Podcast Content Improvements: Script to generate timestamps using OpenAI API (Ready) (#72)
* Some scripts, topics for podcasts * New timestamps for around 20 top podcasts (by topical popularity) * Moved from folder * Fixed some of the podcast issues * Tests for intert_podcast_timestamps and fixes
1 parent 96ea021 commit 81b1a7a

File tree

51 files changed

+8124
-5702
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+8124
-5702
lines changed

_podcast/s03e04-interviewing-300-data-scientists.md

Lines changed: 606 additions & 637 deletions
Large diffs are not rendered by default.

_podcast/s03e07-market-yourself.md

Lines changed: 559 additions & 593 deletions
Large diffs are not rendered by default.

_podcast/s04e08-freelancing.md

Lines changed: 153 additions & 160 deletions
Large diffs are not rendered by default.

_podcast/s05e02-data-engineering-acronyms.md

Lines changed: 596 additions & 616 deletions
Large diffs are not rendered by default.

_podcast/s06e01-solopreneur.md

Lines changed: 686 additions & 725 deletions
Large diffs are not rendered by default.

_podcast/s06e02-non-technical-interviews.md

Lines changed: 807 additions & 854 deletions
Large diffs are not rendered by default.

_podcast/s07e05-machine-learning-system-design-interview.md

Lines changed: 40 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,17 @@
22
episode: 5
33
guests:
44
- valeriybabushkin
5-
intro: "In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com and Kaggle Grandmaster—breaks down how to approach machine learning system design at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex, explaining how to move beyond algorithms to focus on end-to-end design, feature engineering, and evaluation. Valerii walks through a real-world fraud detection example, discusses how to structure interview answers, and outlines the core principles from his book Machine Learning System Design. You’ll learn how to think like a senior ML engineer and design robust, production-ready systems."
6-
description: "Master ML system design interviews with Valerii Babushkin, ex-Meta Head of Data Science. Learn fraud detection systems, feature engineering, metrics selection, and production ML best practices for FAANG interviews."
5+
intro: In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com
6+
and Kaggle Grandmaster—breaks down how to approach machine learning system design
7+
at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex,
8+
explaining how to move beyond algorithms to focus on end-to-end design, feature
9+
engineering, and evaluation. Valerii walks through a real-world fraud detection
10+
example, discusses how to structure interview answers, and outlines the core principles
11+
from his book Machine Learning System Design. You’ll learn how to think like a senior
12+
ML engineer and design robust, production-ready systems.
13+
description: Master ML system design interviews with Valerii Babushkin, ex-Meta Head
14+
of Data Science. Learn fraud detection systems, feature engineering, metrics selection,
15+
and production ML best practices for FAANG interviews.
716
topics:
817
- machine learning
918
- career growth
@@ -18,8 +27,10 @@ links:
1827
youtube: https://www.youtube.com/watch?v=0RsmRjar66E
1928
season: 7
2029
short: Machine Learning System Design Interview
21-
title: "Machine Learning System Design & Interview Strategies for Senior ML Engineers"
30+
title: Machine Learning System Design & Interview Strategies for Senior ML Engineers
2231
transcript:
32+
- header: Podcast Introduction & Episode Overview
33+
- header: 'Valerii Background: Career Snapshot and Kaggle Achievements'
2334
- line: This week, we'll talk about machine learning system design interviews. We
2435
have a special guest today, Valerii. Valerii works at Blockchain.com as a head
2536
of data science. Before that, he worked in quite a few places. More recently at
@@ -47,15 +58,14 @@ transcript:
4758
sec: 182
4859
time: '3:02'
4960
who: Valerii
50-
- header: "Guest Introduction: Head of Data Science & Kaggle Grandmaster"
5161
- line: '[laughs] Okay, so I briefly already told everyone about your background.
5262
But before we go into our main topic of machine learning system design, maybe
5363
let''s talk a bit more about your career journey in detail. Can you tell us a
5464
bit about that?'
5565
sec: 186
5666
time: '3:06'
5767
who: Alexey
58-
- header: "Blockchain.com History & Broad Head of Data Science Role"
68+
- header: 'Blockchain.com Role: Scope, Responsibilities, and Data Ownership'
5969
- line: Well, sure. Let's start from the current time. As you said, I'm head of data
6070
science at Blockchain. So a bit about blockchain, first. It's a very old crypto
6171
company. When I say very old – it is very, very old. It was founded in 2011. Try
@@ -94,7 +104,7 @@ transcript:
94104
sec: 342
95105
time: '5:42'
96106
who: Alexey
97-
- header: "Career Shift: Retail to Facebook Privacy & Large-Scale Systems"
107+
- header: 'Transition to Meta: User Privacy Work and Large-Scale ML Experience'
98108
- line: To some extent, yes, because it's everything related to data – from infrastructure
99109
to applications. From analytics to visualization. Before that, I was working in
100110
– well, I joined Facebook and left Meta. I will just rotate my screen a bit –
@@ -123,6 +133,7 @@ transcript:
123133
sec: 450
124134
time: '7:30'
125135
who: Alexey
136+
- header: 'Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
126137
- line: 'Live interview? Okay. I don''t think it''s about Blockchain’s mission. That''s
127138
it. What else? I was leading quite a big team in my time – the biggest team I
128139
was leading was almost 150 people: machine learning engineers, data analysts,
@@ -163,7 +174,7 @@ transcript:
163174
sec: 547
164175
time: '9:07'
165176
who: Valerii
166-
- header: "ML System Design: Target Audience (Level 5 Senior MEs)"
177+
- header: 'Candidate Targeting: Who Faces ML System Design Interviews'
167178
- line: Okay. Let's talk about machine learning system design. This is a part of the
168179
interview process and you said you did a lot of interviews as the interviewer.
169180
I imagine also, when you were joining Facebook before that, you also had to take
@@ -205,6 +216,7 @@ transcript:
205216
sec: 680
206217
time: '11:20'
207218
who: Alexey
219+
- header: 'Interview Structure: 45-Minute Narrative and Evaluation Goals'
208220
- line: Yeah, true. Good catch. Yes, level five is a Senior in terms of the level
209221
on Facebook, which means that, if you're on this level, it is an honorary thing
210222
to be on this level forever. So if you ended on level four, it was probably because
@@ -241,7 +253,6 @@ transcript:
241253
sec: 798
242254
time: '13:18'
243255
who: Valerii
244-
- header: "System Design vs. ML Design & Focusing on Machine Learning"
245256
- line: 'I think this is what happened to me, but this is something that I prepared
246257
for later. So, you said that important interviews for detecting, or assessing
247258
your level are: behavioral interview, system design interview, and machine learning
@@ -250,6 +261,9 @@ transcript:
250261
sec: 816
251262
time: '13:36'
252263
who: Alexey
264+
- header: 'Contrast: Software System Design Versus ML System Design'
265+
- header: 'Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
266+
Needs'
253267
- line: Okay, let's try to determine the disparity between those two. First of all,
254268
when you're asked to do a system design interview, you're usually asked about
255269
data structures, about different server-side components, like “What are the databases?
@@ -271,7 +285,6 @@ transcript:
271285
sec: 838
272286
time: '13:58'
273287
who: Valerii
274-
- header: "Fraud Detection Walkthrough: Loss Functions, Metrics, Modeling"
275288
- line: 'Now we can say that we know that we have to put not zero or one, but some
276289
score between zero and one, when we have a transaction. When we have a transaction
277290
now, that probably means we''d like to have the system in real time. Okay, let''s
@@ -295,6 +308,7 @@ transcript:
295308
sec: 838
296309
time: '13:58'
297310
who: Valerii
311+
- header: Labeling, Class Imbalance, and Feature Engineering Tradeoffs
298312
- line: Fortunately, the very basic log loss is good here. So we know that we might
299313
start from log loss. We also know that we might start from a very basic linear
300314
regression model. Why is that? Because we know that it has to be very fast – in
@@ -360,6 +374,7 @@ transcript:
360374
sec: 1003
361375
time: '16:43'
362376
who: Valerii
377+
- header: 'Interview Tactics: Stating Assumptions and Getting Alignment'
363378
- line: That's quite a lot of information. I was trying to process this. That's quite
364379
a lot of things. So this was an example of machine learning system design. The
365380
interview starts and then the person – the interviewer – asks you, "Let's design
@@ -368,7 +383,6 @@ transcript:
368383
sec: 1233
369384
time: '20:33'
370385
who: Alexey
371-
- header: "Interview Strategy: Making Assumptions & System vs. ML Design Examples"
372386
- line: The best way is not even to ask, but to say "My assumption is that. Do you
373387
agree with that or not?” You see, you asked the question, but actually, you’ve
374388
made an assumption. You say “Are you okay with that?” Because you've been given
@@ -381,6 +395,7 @@ transcript:
381395
sec: 1270
382396
time: '21:10'
383397
who: Valerii
398+
- header: 'Example: Points-of-Interest System vs Personalized Recommender'
384399
- line: Yeah, indeed. So, the original question I actually asked you is about the
385400
difference between system design and machine learning system design and I think
386401
it's very clear what machine learning system design is. It requires some domain
@@ -443,7 +458,7 @@ transcript:
443458
sec: 1467
444459
time: '24:27'
445460
who: Valerii
446-
- header: "ML System is the Whole Pipeline & Interview Failure: Too Much Heuristics"
461+
- header: 'End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing'
447462
- line: But where does system design actually come into the picture here? Because
448463
here, we talked about selecting the right metric, which was the important thing,
449464
as you said. You said it was log loss for this specific case. Or even before log
@@ -531,7 +546,6 @@ transcript:
531546
sec: 1690
532547
time: '28:10'
533548
who: Valerii
534-
- header: "Securing the Interview: Iterative Baseline Design & Technical Depth"
535549
- line: '[laughs] I might be wrong with using these words. I think the recruiter probably
536550
used different words. But the reason for me failing the process – the whole interview
537551
– was machine learning system design. Not the others. I was afraid about the others.
@@ -543,6 +557,7 @@ transcript:
543557
sec: 1708
544558
time: '28:28'
545559
who: Alexey
560+
- header: 'Securing the Interview: Iterative Baselines and Signposting Depth'
546561
- line: Let's be honest, the interviewer was a human, and humans are subjective. Maybe
547562
they had a bad day. However, to some extent, I'm surprised because it's hard to
548563
say the interview was nodding. Maybe, again, the way you remember it and the way
@@ -586,6 +601,7 @@ transcript:
586601
sec: 1869
587602
time: '31:09'
588603
who: Alexey
604+
- header: 'Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
589605
- line: Well, it's an interesting question for which there is no single answer. It
590606
depends. My opinion is that the interview has to be as close to the real job –
591607
the real work – as it can be. So, to be honest, in applied machine learning, you
@@ -623,7 +639,7 @@ transcript:
623639
sec: 1999
624640
time: '33:19'
625641
who: Valerii
626-
- header: "ML System Prep: Experience, Mock Interviews, Dealing with Unknown Domains"
642+
- header: 'Preparation Strategies: Mock Interviews, Resources, and Experience'
627643
- line: Okay. [laughs] So, how do I actually prepare for machine learning system design
628644
interviews? It feels as though just being a practitioner is not enough. Because,
629645
first, you never know what exactly is expected. I guess you need to ask that.
@@ -709,7 +725,7 @@ transcript:
709725
sec: 2248
710726
time: '37:28'
711727
who: Valerii
712-
- header: "Tool: ML Project Checklist & Defining Goal, Proxy Metrics, Long-Term Health"
728+
- header: 'Industry Checklist: Core ML Project Review Items and Patterns'
713729
- line: Speaking of this mock interview – a while ago, I had a mock interview with
714730
Valerii, where Valerii interviewed me. The question was about designing a fraud
715731
detection system.
@@ -755,7 +771,7 @@ transcript:
755771
sec: 2353
756772
time: '39:13'
757773
who: Valerii
758-
- header: "Post-Goal Steps: Features, Validation, A/B Testing, Monitoring, Fallbacks"
774+
- header: 'Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health'
759775
- line: So about this checklist – let's say we need to design a system, not necessarily
760776
for an interview, but just design a system. What is the first thing we need to
761777
do? Do you remember what is in this checklist?
@@ -839,6 +855,7 @@ transcript:
839855
sec: 2641
840856
time: '44:01'
841857
who: Alexey
858+
- header: Features, Labels, Model Selection, and Validation Workflow
842859
- line: Let's say we know what we would like to do. We know how we can try to optimize
843860
it in this way. What does that mean? That means that if my model improves, there
844861
is a high chance that my metric of interest will be better. Now, I need to think
@@ -868,6 +885,7 @@ transcript:
868885
sec: 2651
869886
time: '44:11'
870887
who: Valerii
888+
- header: 'Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
871889
- line: Perhaps if you cover all these parts during your system design interview,
872890
you're already in quite a good position. Right?
873891
sec: 2762
@@ -914,7 +932,7 @@ transcript:
914932
sec: 2868
915933
time: '47:48'
916934
who: Valerii
917-
- header: "ML System Components: Algorithms are 1-5% & Features are Paramount"
935+
- header: 'System Components: Why Features Matter More Than Model Architecture'
918936
- line: Okay. So let's go to the questions. We have quite a few of them. The first
919937
question we have is, “What are the typical components of a machine learning system?
920938
And what percentage of it are machine learning algorithms?”
@@ -968,6 +986,7 @@ transcript:
968986
sec: 2997
969987
time: '49:57'
970988
who: Valerii
989+
- header: 'Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
971990
- line: Thank you. Let's go to the next one, “How to make machine learning algorithms
972991
work with other parts of systems to solve real world problems?” I guess the question
973992
is more about, “Okay, we have this model that we just discussed. This model for
@@ -1000,7 +1019,7 @@ transcript:
10001019
sec: 3134
10011020
time: '52:14'
10021021
who: Alexey
1003-
- header: "Concept: Avoiding ML & Tool: Machine Learning Design Patterns Book"
1022+
- header: When to Avoid ML and Useful Design Pattern References
10041023
- line: Do we really need machine learning here exactly? Maybe we can be lucky and
10051024
we can just avoid it.
10061025
sec: 3145
@@ -1052,7 +1071,7 @@ transcript:
10521071
sec: 3239
10531072
time: '53:59'
10541073
who: Valerii
1055-
- header: "New Grad Interviews: No System Design & Focus on Coding (LeetCode)"
1074+
- header: 'New Grad Expectations: Coding Focus and Limited System Design'
10561075
- line: Yeah, so another question from Alvaro. Alvaro is graduating soon and he is
10571076
a machine learning intern at a startup. He's starting a job hunt, hopefully [inaudible].
10581077
So how much system design should he expect as a new grad?
@@ -1130,7 +1149,7 @@ transcript:
11301149
sec: 3440
11311150
time: '57:20'
11321151
who: Valerii
1133-
- header: "Validation in Production: A/B Tests, Human Labels, Practitioner Experience"
1152+
- header: 'Validating in Production: A/B Tests, Causality, and Human Labels'
11341153
- line: Okay. I don't think we have a lot of time for more questions. There is an
11351154
interesting question from Vijay, which is about, “What is the best way to validate
11361155
the model performance in production? Do we need humans for that or are there other
@@ -1177,6 +1196,7 @@ transcript:
11771196
sec: 3527
11781197
time: '58:47'
11791198
who: Valerii
1199+
- header: 'Career Path: Moving from Data Science Practice to System Design'
11801200
- line: Yeah, so the question is, “With this profile, you're very good at doing data
11811201
science stuff. How did you transition from data science to being good at system
11821202
design?”
@@ -1203,6 +1223,7 @@ transcript:
12031223
sec: 3583
12041224
time: '59:43'
12051225
who: Valerii
1226+
- header: Closing Remarks and Contact Information
12061227
- line: '[laughs] Okay, I think that''s all we have time for. So maybe last one –
12071228
How can people find you?'
12081229
sec: 3603
@@ -1260,7 +1281,6 @@ transcript:
12601281
time: '1:00:51'
12611282
who: Valerii
12621283
---
1263-
12641284
Links:
12651285

12661286
* [Valerii's telegram channel (in Russian)](https://t.me/cryptovalerii){:target="_blank"}

0 commit comments

Comments
 (0)