Skip to content

Thomas-Rauter/Educational_Resources_for_Applied_Machine_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

(Free) Educational Resources for Applied Machine Learning

To describe this repo, I will borrow a description from Cassie Kozyrkov (former Chief Decicion Scientist at Google). You probably think there is just one type of machine learning (ML), but there are actually two: machine learning research and applied machine learning. Machine learning research is about generating ML knowledge and building the algorithms and other tools needed to learn from data. Applied machine learning on the other hand uses the available knowledge, algorithms and tools to make predictions that provide value to a business or other organisation. To put this in culinary terms, machine learning research deals with how to build ovens, microwaves, stoves, etc. whereas applied machine learning is about how to use those cooking devices to process food at scale. The data are the ingredients and the cooked meals are the predictions.

Most ML books, resources, and university programs available are about machine learning research. According to Cassie Kozyrkov, a primary reason why ML projects fail is because most practicioners are very good at the machine learning research part but much poorer at the applied machine learning part. This is because in real world projects, the applied machine learning part is very important, whereas there are so many excellent algorithms already available that your building skills are not so important. Rather, your "cooking skills" are what make the difference in practice (if you care only about positions in academia or research in industry, then of course the machine learning research knowledge and skills are much more important).

This repo attempts to collect (mostly free) resources about applied machine learning, to fill this gap.

The list below includes textbooks, guides, cheatsheets, links to courses, and free ebooks that I have written (also available as .tex files).

When the resources were freely available, I directly linked them here with the PDF to download (the titles are linked), otherwise, I linked to the website where they are available (for purchase).


📚 Resources

Andriy Burkov is a seasoned machine learning expert with over two decades of experience. His book is concise but broad book and covers the end-to-end lifecycle of machine learning systems. It’s not a deep dive into algorithms — instead, it focuses on design decisions, system architecture,
monitoring, andreal-world constraints in ML projects. Ideal for
engineers and practitioners who need to bridge the gap between experimentation and production.


Andrew Ng is a leading authority in AI and machine learning, known for co-founding Google Brain, teaching millions through his online courses, and bridging cutting-edge research with practical, scalable applications. This book of him provides a practical guide to structuring machine learning projects effectively. It emphasizes strategies for setting up training and test sets, selecting appropriate evaluation metrics, and conducting error analysis. Aimed at practitioners, it offers insights into diagnosing issues like bias and variance, and making informed decisions about model design and iteration.


This course teaches how to design and manage the full lifecycle of machine learning systems in production. It covers choosing deployment and monitoring strategies, improving model performance by focusing on critical data slices, and handling real-world data challenges — from inconsistent labels to working with both small and large, structured and unstructured datasets.


This course focuses on the strategic decision-making needed to lead successful ML projects. It covers diagnosing and reducing model errors, handling mismatches between training and test data, and deciding when to use techniques like transfer learning or end-to-end models. Geared toward aspiring ML leaders, it distills hard-earned industry experience into a practical framework for managing complex ML development challenges.


This course is from DeepLearning.ai and therefore indirectly also from Andrew Ng, altough he is not the instructor. This course contains practical insights by highlighting what worked in real world projects and what did not. It covers real-world case studies related to public health, climate change, and disaster management.


Cassie Kozyrkov was a Chief Decision Scientist at Google and is now an AI-advisor. Therefore, she has hands-on practical experience of what works in machine learning projects and what doesn't. She is sharing her experience in the form of a YouTube playlist. About half of the videos in the playlist are pretty basic and explain things like what supervised learning is. But in the other half she is sharing her valuable experience about how to use machine learning successfully in a business.


This book provides a practical, end-to-end guide for developing machine learning applications, focusing on the entire lifecycle from concept to deployment. Ameisen emphasizes starting with simple, rule-based solutions to validate ideas before introducing ML components. The book covers key aspects such as data collection, model training, evaluation, deployment, and monitoring. It also discusses the importance of aligning ML solutions with business objectives and includes real-world examples and interviews with industry professionals to illustrate common challenges and best practices.


Chip Huyen is a computer scientist, that graduated from Stanford and taught ML Systems there. I think the description of the author herself best summarizes what this book can bring you: This book is for anyone who wants to leverage ML to solve
real-world problems. This book is not an introduction to ML. There are many
books, courses, and resources available for ML theories, and therefore, this book shies away from these concepts to focus on the practical aspects of ML.


This cheat sheet helps you choose the best machine learning algorithm for your predictive analytics solution. Your decision is driven by both the nature of your data and the goal you want to achieve with your data.


Similar to the one above but from Scikit Learn. It essentially is a decision tree that helps you decide which algorithm to use for your problem. I personally find that it ignores top algorithms, such as XGBoost and Neural Networks, which surprises me. But it a well known cheat sheet, and I find it still helpful.


Google made a webpage with 43 practical rules for machine learning. Those rules are short and quite dense in information. They are for advanced practioners, and some of them are quite specific. They are not really necessary when you just work on private projects, but I am sure are invaluable as soon as you are working in large real world projects and want to make the best decisions.


A seminal paper that introduced the idea of "ML technical debt." It explains how ML systems tend to accumulate complexity over time — through configuration, data dependencies, monitoring, etc. Useful for understanding why ML in production is hard, and why traditional software engineering intuition often fails in ML contexts.


My ebook 1: Navigating Machine Learning Projects

This book is similar to the book Machine Learning Engineering from Andriy Burkov. It is about how to make effective decisions in real world machine learning projects. For example, you might have great knowledge about neural networks and know exactly how to technically implement them. You train one, but the prediction performance is not good enough. What do you do now? Do you train a bigger neural network? Or a smaller one? Or do you collect more data? Or do you fix the labels? In ML projects, you will face many such decisions, and you rarely have enough time to try out all options. This book helps you to make smart decisions in all the stages of the ML project lifecycle, which includes scoping, data, modeling, insights, and deployment.

📄 Download Versions:

Note: I uploaded the .tex document of this ebook to this repo. However, this ebook contains several figures, and many of them I generated with custom Python scripts. Therefore, this ebook has its separate GitHub repo.


My ebook 2: Organising Machine Learning Projects

This book is about best practices and guidelinesfor organising ML projects. It covers environment and dependency management, best practices for Git commits, documentation of the different levels of a project (project, data, model, and code), experiment logging and tracking, dataset versioning, and model storage and versioning. While many things written down in this book can be considered common sense, I know from my own experience that one too easily throws them overboard or never starts doing those things, because one "does not have the time for this", or "as long as I work on the project, I remember all the things". Simple private project might not suffer from a lack of organisation, but when a project goes over months and multiple people are working on it, a poor organisation soon can be come a nightmare. This book attempts to adress this.

📄 Download Versions:


💬 Feedback

I would love to hear your thoughts. When think something in those books or resources is incorrect (especially in the ones I have written) then you can open an issue and start a discussion like this. Thank you!

About

Repository linking to (mostly free) educational resources for applied machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages