Skip to content

Commit 3499c1f

Browse files
authored
Script to generate podcast titles, intro text, meta descriptions with OpenAI API (#74)
* Some scripts, topics for podcasts * New timestamps for around 20 top podcasts (by topical popularity) * Moved from folder * Fixed some of the podcast issues * Tests for intert_podcast_timestamps and fixes * Script to generate podcast titles * A script to generate intro for podcasts * updates to scripts * Updated titles, intros * script for meta descriptions and updated pages * Added update date * Updates
1 parent 4def4a0 commit 3499c1f

29 files changed

+5136
-263
lines changed

_podcast/s03e04-interviewing-300-data-scientists.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
---
2-
title: What I Learned After Interviewing 300 Data Scientists
3-
short: What I Learned After Interviewing 300 Data Scientists
2+
title: 'Data Science Interview Guide: CV Optimization, Take-Home Projects, Mock Interviews
3+
& Negotiation'
4+
short: 'Data Science Interview Guide: CV Optimization, Take-Home Projects, Mock Interviews
5+
& Negotiation'
46
guests:
57
- olegnovikov
68
image: images/podcast/s03e04-interviewing-300-data-scientists.jpg
79
season: 3
810
episode: 4
11+
date: 2025-11-07
912
ids:
1013
youtube: AYi7b-8GPm4
1114
anchor: What-I-Learned-After-Interviewing-300-Data-Scientists---Oleg-Novikov-e10ctbs
@@ -16,6 +19,13 @@ links:
1619
apple: https://podcasts.apple.com/us/podcast/what-i-learned-after-interviewing-300-data-scientists/id1541710331?i=1000520681105
1720
transcript:
1821
- header: Introduction & Episode Overview
22+
- line: This week we will talk about the interview process, getting hired as a data
23+
scientist — and not only data scientists. We have a special guest today — Oleg.
24+
Oleg worked as a data science manager at Uber, where he built data science teams.
25+
He also has experience building several startups in Europe. Recently he created
26+
NextRound which is a free service for practicing interviews, receiving personalized
27+
feedback, and learning materials. Welcome!
28+
- header: Introduction & Episode Overview
1929
- line: This week we will talk about the interview process, getting hired as a data
2030
scientist — and not only data scientists. We have a special guest today — Oleg.
2131
Oleg worked as a data science manager at Uber, where he built data science teams.
@@ -917,6 +927,27 @@ transcript:
917927
sec: 4194
918928
time: '1:09:54'
919929
who: Alexey
930+
intro: How do you make your data science application stand out, ace take-home projects,
931+
and negotiate an offer without leaving money on the table? In this episode, Oleg
932+
Novikov — creator of NextRound and former data science manager at Uber with a background
933+
in data and software engineering — walks through a practical data science interview
934+
guide covering CV optimization, take-home projects, mock interviews, and negotiation.
935+
<br><br> We dig into career trajectory from engineering to product data science,
936+
building projects that differentiate your application, and concrete product work
937+
like forecasting and LTV. Oleg demonstrates NextRound's mock-interview chatbot and
938+
personalized feedback, explains common hiring funnels (recruiter screen → take-home
939+
→ interviews), and contrasts product data scientist vs. machine learning engineer
940+
expectations. You'll hear specific advice on treating your CV as a landing page,
941+
highlighting personal contributions, crafting case-study narratives from business
942+
goals to evaluation metrics, and preparing for technical assessments (ML fundamentals,
943+
SQL window functions, coding). We also cover handling rejection, replying graciously,
944+
evaluating offers, negotiation tactics when your current salary is low, and practical
945+
steps for PhDs breaking into industry. <br><br> Listen for actionable steps to refine
946+
your data science resume, prioritize take-home ROI, and use mock interviews to iterate
947+
faster.
948+
description: Master CV optimization, take-home projects and mock interviews to land
949+
data science offers—learn SQL/ML prep, negotiation tactics and measurable project
950+
impact.
920951
---
921952
Links:
922953

_podcast/s03e07-market-yourself.md

Lines changed: 552 additions & 3 deletions
Large diffs are not rendered by default.

_podcast/s04e08-freelancing.md

Lines changed: 171 additions & 0 deletions
Large diffs are not rendered by default.

_podcast/s05e02-data-engineering-acronyms.md

Lines changed: 612 additions & 2 deletions
Large diffs are not rendered by default.

_podcast/s06e01-solopreneur.md

Lines changed: 676 additions & 5 deletions
Large diffs are not rendered by default.

_podcast/s06e02-non-technical-interviews.md

Lines changed: 792 additions & 11 deletions
Large diffs are not rendered by default.

_podcast/s07e05-machine-learning-system-design-interview.md

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,25 @@
22
episode: 5
33
guests:
44
- valeriybabushkin
5-
intro: In this episode, Valerii Babushkin—then Head of Data Science at Blockchain.com
6-
and Kaggle Grandmaster—breaks down how to approach machine learning system design
7-
at scale. He shares insights from building ML systems at Meta, Alibaba, and Yandex,
8-
explaining how to move beyond algorithms to focus on end-to-end design, feature
9-
engineering, and evaluation. Valerii walks through a real-world fraud detection
10-
example, discusses how to structure interview answers, and outlines the core principles
11-
from his book Machine Learning System Design. You’ll learn how to think like a senior
12-
ML engineer and design robust, production-ready systems.
13-
description: Master ML system design interviews with Valerii Babushkin, ex-Meta Head
14-
of Data Science. Learn fraud detection systems, feature engineering, metrics selection,
15-
and production ML best practices for FAANG interviews.
5+
intro: 'How do you approach ML system design interviews that probe production constraints,
6+
fraud detection trade-offs, and MLOps realities? In this episode, Valerii Babushkin
7+
— Senior Director of Data, Analytics, and AI at BP, Kaggle Competitions Grandmaster,
8+
and author of Machine Learning System Design — walks through what interviewers look
9+
for and how candidates should structure answers for real-world ML problems. <br><br>
10+
We cover concrete topics you can use in interviews and on the job: distinguishing
11+
software vs. ML system design; a fraud detection case study (probabilities, loss
12+
functions, real-time requirements); label noise, class imbalance, and feature engineering
13+
trade-offs; end-to-end pipeline items like metrics, baselines, A/B testing, and
14+
validating in production; monitoring, distribution shift, fallbacks, and production
15+
robustness; serving models, embeddings, and MLOps roles; plus when to avoid ML and
16+
practical checklist items for core projects. Valerii also shares interview tactics
17+
— signposting depth, stating assumptions, iterative baselines — and guidance for
18+
new grads and career progression toward system design roles. <br><br> Listen to
19+
learn actionable frameworks, example trade-offs, and preparation strategies to improve
20+
your ML system design interviews and production ML decisions.'
21+
description: 'Master ML system design: fraud detection, feature engineering & A/B
22+
testing to ace interviews, build robust production models, monitoring and MLOps.'
23+
date: 2025-11-07
1624
topics:
1725
- machine learning
1826
- career growth
@@ -27,10 +35,13 @@ links:
2735
youtube: https://www.youtube.com/watch?v=0RsmRjar66E
2836
season: 7
2937
short: Machine Learning System Design Interview
30-
title: Machine Learning System Design & Interview Strategies for Senior ML Engineers
38+
title: 'ML System Design Interviews: Production ML, Fraud Detection, Features, A/B
39+
Testing & MLOps'
3140
transcript:
3241
- header: Podcast Introduction & Episode Overview
3342
- header: 'Valerii Background: Career Snapshot and Kaggle Achievements'
43+
- header: Podcast Introduction & Episode Overview
44+
- header: 'Valerii Background: Career Snapshot and Kaggle Achievements'
3445
- line: This week, we'll talk about machine learning system design interviews. We
3546
have a special guest today, Valerii. Valerii works at Blockchain.com as a head
3647
of data science. Before that, he worked in quite a few places. More recently at
@@ -66,6 +77,7 @@ transcript:
6677
time: '3:06'
6778
who: Alexey
6879
- header: 'Blockchain.com Role: Scope, Responsibilities, and Data Ownership'
80+
- header: 'Blockchain.com Role: Scope, Responsibilities, and Data Ownership'
6981
- line: Well, sure. Let's start from the current time. As you said, I'm head of data
7082
science at Blockchain. So a bit about blockchain, first. It's a very old crypto
7183
company. When I say very old – it is very, very old. It was founded in 2011. Try
@@ -105,6 +117,7 @@ transcript:
105117
time: '5:42'
106118
who: Alexey
107119
- header: 'Transition to Meta: User Privacy Work and Large-Scale ML Experience'
120+
- header: 'Transition to Meta: User Privacy Work and Large-Scale ML Experience'
108121
- line: To some extent, yes, because it's everything related to data – from infrastructure
109122
to applications. From analytics to visualization. Before that, I was working in
110123
– well, I joined Facebook and left Meta. I will just rotate my screen a bit –
@@ -134,6 +147,7 @@ transcript:
134147
time: '7:30'
135148
who: Alexey
136149
- header: 'Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
150+
- header: 'Hiring Experience: Conducting High-Volume Interviews and Team Leadership'
137151
- line: 'Live interview? Okay. I don''t think it''s about Blockchain’s mission. That''s
138152
it. What else? I was leading quite a big team in my time – the biggest team I
139153
was leading was almost 150 people: machine learning engineers, data analysts,
@@ -175,6 +189,7 @@ transcript:
175189
time: '9:07'
176190
who: Valerii
177191
- header: 'Candidate Targeting: Who Faces ML System Design Interviews'
192+
- header: 'Candidate Targeting: Who Faces ML System Design Interviews'
178193
- line: Okay. Let's talk about machine learning system design. This is a part of the
179194
interview process and you said you did a lot of interviews as the interviewer.
180195
I imagine also, when you were joining Facebook before that, you also had to take
@@ -217,6 +232,7 @@ transcript:
217232
time: '11:20'
218233
who: Alexey
219234
- header: 'Interview Structure: 45-Minute Narrative and Evaluation Goals'
235+
- header: 'Interview Structure: 45-Minute Narrative and Evaluation Goals'
220236
- line: Yeah, true. Good catch. Yes, level five is a Senior in terms of the level
221237
on Facebook, which means that, if you're on this level, it is an honorary thing
222238
to be on this level forever. So if you ended on level four, it was probably because
@@ -262,6 +278,9 @@ transcript:
262278
time: '13:36'
263279
who: Alexey
264280
- header: 'Contrast: Software System Design Versus ML System Design'
281+
- header: 'Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
282+
Needs'
283+
- header: 'Contrast: Software System Design Versus ML System Design'
265284
- header: 'Fraud Detection Case Study: Probabilities, Loss Functions, and Real-Time
266285
Needs'
267286
- line: Okay, let's try to determine the disparity between those two. First of all,
@@ -309,6 +328,7 @@ transcript:
309328
time: '13:58'
310329
who: Valerii
311330
- header: Labeling, Class Imbalance, and Feature Engineering Tradeoffs
331+
- header: Labeling, Class Imbalance, and Feature Engineering Tradeoffs
312332
- line: Fortunately, the very basic log loss is good here. So we know that we might
313333
start from log loss. We also know that we might start from a very basic linear
314334
regression model. Why is that? Because we know that it has to be very fast – in
@@ -375,6 +395,7 @@ transcript:
375395
time: '16:43'
376396
who: Valerii
377397
- header: 'Interview Tactics: Stating Assumptions and Getting Alignment'
398+
- header: 'Interview Tactics: Stating Assumptions and Getting Alignment'
378399
- line: That's quite a lot of information. I was trying to process this. That's quite
379400
a lot of things. So this was an example of machine learning system design. The
380401
interview starts and then the person – the interviewer – asks you, "Let's design
@@ -396,6 +417,7 @@ transcript:
396417
time: '21:10'
397418
who: Valerii
398419
- header: 'Example: Points-of-Interest System vs Personalized Recommender'
420+
- header: 'Example: Points-of-Interest System vs Personalized Recommender'
399421
- line: Yeah, indeed. So, the original question I actually asked you is about the
400422
difference between system design and machine learning system design and I think
401423
it's very clear what machine learning system design is. It requires some domain
@@ -459,6 +481,7 @@ transcript:
459481
time: '24:27'
460482
who: Valerii
461483
- header: 'End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing'
484+
- header: 'End-to-End ML Pipeline: Metrics, Baselines, and A/B Testing'
462485
- line: But where does system design actually come into the picture here? Because
463486
here, we talked about selecting the right metric, which was the important thing,
464487
as you said. You said it was log loss for this specific case. Or even before log
@@ -558,6 +581,7 @@ transcript:
558581
time: '28:28'
559582
who: Alexey
560583
- header: 'Securing the Interview: Iterative Baselines and Signposting Depth'
584+
- header: 'Securing the Interview: Iterative Baselines and Signposting Depth'
561585
- line: Let's be honest, the interviewer was a human, and humans are subjective. Maybe
562586
they had a bad day. However, to some extent, I'm surprised because it's hard to
563587
say the interview was nodding. Maybe, again, the way you remember it and the way
@@ -602,6 +626,7 @@ transcript:
602626
time: '31:09'
603627
who: Alexey
604628
- header: 'Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
629+
- header: 'Appropriate Depth: Practical ML Decisions vs Research-Level Detail'
605630
- line: Well, it's an interesting question for which there is no single answer. It
606631
depends. My opinion is that the interview has to be as close to the real job –
607632
the real work – as it can be. So, to be honest, in applied machine learning, you
@@ -640,6 +665,7 @@ transcript:
640665
time: '33:19'
641666
who: Valerii
642667
- header: 'Preparation Strategies: Mock Interviews, Resources, and Experience'
668+
- header: 'Preparation Strategies: Mock Interviews, Resources, and Experience'
643669
- line: Okay. [laughs] So, how do I actually prepare for machine learning system design
644670
interviews? It feels as though just being a practitioner is not enough. Because,
645671
first, you never know what exactly is expected. I guess you need to ask that.
@@ -726,6 +752,7 @@ transcript:
726752
time: '37:28'
727753
who: Valerii
728754
- header: 'Industry Checklist: Core ML Project Review Items and Patterns'
755+
- header: 'Industry Checklist: Core ML Project Review Items and Patterns'
729756
- line: Speaking of this mock interview – a while ago, I had a mock interview with
730757
Valerii, where Valerii interviewed me. The question was about designing a fraud
731758
detection system.
@@ -772,6 +799,7 @@ transcript:
772799
time: '39:13'
773800
who: Valerii
774801
- header: 'Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health'
802+
- header: 'Defining Goals and Proxy Metrics: Business Alignment and Long-Term Health'
775803
- line: So about this checklist – let's say we need to design a system, not necessarily
776804
for an interview, but just design a system. What is the first thing we need to
777805
do? Do you remember what is in this checklist?
@@ -856,6 +884,7 @@ transcript:
856884
time: '44:01'
857885
who: Alexey
858886
- header: Features, Labels, Model Selection, and Validation Workflow
887+
- header: Features, Labels, Model Selection, and Validation Workflow
859888
- line: Let's say we know what we would like to do. We know how we can try to optimize
860889
it in this way. What does that mean? That means that if my model improves, there
861890
is a high chance that my metric of interest will be better. Now, I need to think
@@ -886,6 +915,7 @@ transcript:
886915
time: '44:11'
887916
who: Valerii
888917
- header: 'Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
918+
- header: 'Production Robustness: Monitoring, Distribution Shift, and Fallbacks'
889919
- line: Perhaps if you cover all these parts during your system design interview,
890920
you're already in quite a good position. Right?
891921
sec: 2762
@@ -933,6 +963,7 @@ transcript:
933963
time: '47:48'
934964
who: Valerii
935965
- header: 'System Components: Why Features Matter More Than Model Architecture'
966+
- header: 'System Components: Why Features Matter More Than Model Architecture'
936967
- line: Okay. So let's go to the questions. We have quite a few of them. The first
937968
question we have is, “What are the typical components of a machine learning system?
938969
And what percentage of it are machine learning algorithms?”
@@ -987,6 +1018,7 @@ transcript:
9871018
time: '49:57'
9881019
who: Valerii
9891020
- header: 'Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
1021+
- header: 'Engineering Integration: Serving Models, Embeddings, and MLOps Roles'
9901022
- line: Thank you. Let's go to the next one, “How to make machine learning algorithms
9911023
work with other parts of systems to solve real world problems?” I guess the question
9921024
is more about, “Okay, we have this model that we just discussed. This model for
@@ -1020,6 +1052,7 @@ transcript:
10201052
time: '52:14'
10211053
who: Alexey
10221054
- header: When to Avoid ML and Useful Design Pattern References
1055+
- header: When to Avoid ML and Useful Design Pattern References
10231056
- line: Do we really need machine learning here exactly? Maybe we can be lucky and
10241057
we can just avoid it.
10251058
sec: 3145
@@ -1072,6 +1105,7 @@ transcript:
10721105
time: '53:59'
10731106
who: Valerii
10741107
- header: 'New Grad Expectations: Coding Focus and Limited System Design'
1108+
- header: 'New Grad Expectations: Coding Focus and Limited System Design'
10751109
- line: Yeah, so another question from Alvaro. Alvaro is graduating soon and he is
10761110
a machine learning intern at a startup. He's starting a job hunt, hopefully [inaudible].
10771111
So how much system design should he expect as a new grad?
@@ -1150,6 +1184,7 @@ transcript:
11501184
time: '57:20'
11511185
who: Valerii
11521186
- header: 'Validating in Production: A/B Tests, Causality, and Human Labels'
1187+
- header: 'Validating in Production: A/B Tests, Causality, and Human Labels'
11531188
- line: Okay. I don't think we have a lot of time for more questions. There is an
11541189
interesting question from Vijay, which is about, “What is the best way to validate
11551190
the model performance in production? Do we need humans for that or are there other
@@ -1197,6 +1232,7 @@ transcript:
11971232
time: '58:47'
11981233
who: Valerii
11991234
- header: 'Career Path: Moving from Data Science Practice to System Design'
1235+
- header: 'Career Path: Moving from Data Science Practice to System Design'
12001236
- line: Yeah, so the question is, “With this profile, you're very good at doing data
12011237
science stuff. How did you transition from data science to being good at system
12021238
design?”
@@ -1224,6 +1260,7 @@ transcript:
12241260
time: '59:43'
12251261
who: Valerii
12261262
- header: Closing Remarks and Contact Information
1263+
- header: Closing Remarks and Contact Information
12271264
- line: '[laughs] Okay, I think that''s all we have time for. So maybe last one –
12281265
How can people find you?'
12291266
sec: 3603

0 commit comments

Comments
 (0)