Skip to content

Commit 7d66d34

Browse files
llm zoomcamp article
1 parent 434391f commit 7d66d34

File tree

14 files changed

+247
-0
lines changed

14 files changed

+247
-0
lines changed

_posts/2024-11-11-llm-zoomcamp.md

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
---
2+
authors:
3+
- valeriiakuka
4+
description: Learn to apply LLMs in real life in 10 weeks
5+
image: images/posts/2024-11-11-llm-zoomcamp/cover.jpg
6+
layout: post
7+
subtitle: Learn to apply LLMs in real life in 10 weeks
8+
tags:
9+
- courses
10+
- llm
11+
title: LLM Zoomcamp
12+
---
13+
14+
<figure>
15+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image1.png" />
16+
<figcaption>Cover of the LLM Zoomcamp course</figcaption>
17+
</figure>
18+
19+
In this article, we will talk about [LLM Zoomcamp](https://github.com/DataTalksClub/llm-zoomcamp){:target="_blank"}, our free online course to get started with real-life applications of LLMs. In 10 weeks, you will learn how to build an AI system that answers questions about your knowledge base.
20+
21+
We will cover different aspects of this course so you can learn more about it:
22+
23+
- [Who is the Course For?](#who-is-the-course-for)
24+
- [Course Curriculum](#course-curriculum)
25+
- [Course Assignments and Scoring](#course-assignments-and-scoring)
26+
- [Homework and getting feedback](#homework-and-getting-feedback)
27+
- [Learning in Public Approach](#learning-in-public-approach)
28+
- [Course Projects for Your Portfolio](#course-projects-for-your-portfolio)
29+
- [DataTalks.Club Community](#datatalksclub-community)
30+
31+
32+
> [Looking for a free course about LLMs? Join our LLM Zoomcamp](https://github.com/DataTalksClub/llm-zoomcamp/tree/main){:target="_blank"}
33+
34+
## Who is the course for?
35+
36+
Before we get into the details, it’s important to know what skills you should have to join the course comfortably.
37+
38+
Here are the main prerequisites for the course:
39+
40+
- Comfortable with programming and Python
41+
- Comfortable with command line
42+
- Docker
43+
- No previous exposure to AI or ML is required
44+
45+
46+
47+
## Course Curriculum
48+
49+
- Module 1: Introduction to LLMs and RAG
50+
- Module 2: Open-source LLMs
51+
- Module 3: Vector Databases
52+
- Module 4: Evaluation and Monitoring
53+
- Module 5: LLM Orchestration and Ingestion
54+
- Module 6: Best Practices
55+
- Module 7: Bonus: End-to-End Project Example
56+
57+
58+
59+
Let's quickly go over each week, focusing on the main points and the tech you'll use.
60+
61+
### Module 1: Introduction to LLMs and RAG
62+
63+
<figure>
64+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image2.png" />
65+
<figcaption>Screenshot of the <a href="https://youtu.be/Q75JgLEXMsM?si=O8DOJqARkOlzEhKH">lecture slides</a> from <a href="https://github.com/DataTalksClub/llm-zoomcamp/tree/main/01-intro">module 1</a></figcaption>
66+
</figure>
67+
68+
We introduce the core ideas behind Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). You’ll set up your development environment, learn how retrieval works, and start experimenting with APIs and search tools. By the end of this module, you’ll have a basic RAG setup and be familiar with text search fundamentals.
69+
70+
You will learn to:
71+
72+
- Set up your environment for LLM and RAG experimentation
73+
- Understand the basics of retrieval and search
74+
- Use the OpenAI API to integrate LLM capabilities
75+
- Build a simple RAG system
76+
- Implement basic text search with Elasticsearch
77+
78+
79+
80+
### Module 2: Open-source LLMs
81+
82+
We dive into the world of open-source LLMs, providing hands-on experience with popular, freely available models. You’ll learn how to configure a GPU environment, access models from the Hugging Face Hub, and even run LLMs on CPUs when GPUs aren’t available. This module ends with creating a simple UI to see your model in action.
83+
84+
You will learn to:
85+
86+
- Set up and optimize a GPU environment
87+
- Access and use open-source models from Hugging Face
88+
- Run models on a CPU using Ollama when GPUs aren’t available
89+
- Create a basic, interactive UI with Streamlit for testing your model
90+
91+
92+
93+
### Module 3: Vector Databases
94+
95+
<figure>
96+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image7.png" />
97+
<figcaption>Screenshot of the <a href="https://youtu.be/C5AWdL3kg1Q?si=MB8ODE4Z-hphfvX1">lecture slides</a> from <a href="https://github.com/DataTalksClub/llm-zoomcamp/tree/main/03-vector-search">module 3</a></figcaption>
98+
</figure>
99+
100+
This module covers how to use vector databases for effective search and retrieval. You’ll learn to create embeddings (vector representations of text), index them, and use vector search to improve RAG performance.
101+
102+
You will learn to:
103+
104+
- Create and index embeddings for vector-based retrieval
105+
- Implement vector search using Elasticsearch
106+
- Conduct offline evaluations to assess your retrieval system
107+
- Work hands-on with dlt to practice embedding indexing and search
108+
109+
110+
111+
### Module 4: Evaluation and Monitoring
112+
113+
<figure>
114+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image9.png" />
115+
<figcaption>Screenshot of the <a href="https://youtu.be/OWqinqemCmk?si=CJZDFiFu5H31Gr6x">lecture slides</a> from <a href="https://github.com/DataTalksClub/llm-zoomcamp/tree/main/04-monitoring">module 4</a></figcaption>
116+
</figure>
117+
118+
We focus on evaluating your RAG system and setting up monitoring tools. You’ll explore different metrics to judge your system’s performance and set up a feedback loop for continuous improvement. Grafana dashboards will help you visualize insights and track system usage.
119+
120+
You will learn to:
121+
122+
- Perform offline evaluations of your RAG pipeline
123+
- Use Cosine Similarity and LLM-as-a-Judge metrics to assess retrieval
124+
- Track chat history and collect user feedback for iterative improvement
125+
- Build Grafana dashboards to monitor performance in real time
126+
127+
128+
129+
### Module 5: LLM Orchestration and Ingestion
130+
131+
This module teaches you how to efficiently manage data ingestion for LLMs.
132+
133+
You will learn to:
134+
135+
- Ingest data seamlessly
136+
- Set up a smooth data pipeline for LLM projects
137+
- Prepare data for scalable and efficient processing in RAG systems
138+
139+
140+
141+
### Module 6: Best Practices
142+
143+
We dive into advanced techniques for refining your RAG pipeline, from improving retrieval quality to enhancing search relevance. You’ll practice hybrid search methods, document reranking, and explore using LangChain for more complex applications.
144+
145+
You will learn to:
146+
147+
- Apply best practices to optimize your RAG pipeline
148+
- Use hybrid search techniques to increase retrieval accuracy
149+
- Implement document reranking to enhance search results
150+
- Set up hybrid search with LangChain for advanced retrieval tasks
151+
152+
153+
154+
### Module 7: End-to-End Project
155+
156+
<figure>
157+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image12.png" />
158+
<figcaption>Screenshot of the <a href="https://youtu.be/E9O0Tg68PPg?si=hgbdVIE-uMH70cHQ">lecture slides</a> from <a href="https://github.com/DataTalksClub/llm-zoomcamp/tree/main/07-project-example">module 7</a></figcaption>
159+
</figure>
160+
161+
You’ll bring everything together in a practical project. You’ll apply all the skills you’ve learned to complete an end-to-end project, from data preprocessing to deploying your solution.
162+
163+
You will learn to:
164+
165+
- Build an end-to-end project using RAG techniques
166+
- Practice preprocessing text data for specific use cases
167+
- Apply learned techniques in a real-world project
168+
169+
170+
171+
The [course description](https://github.com/DataTalksClub/llm-zoomcamp){:target="_blank"} on GitHub provides a detailed overview of the topics covered each week. You can see the video lectures, slides, code, and community notes for each week of the course to dive into the content. By the end of the course, you will have acquired the fundamental skills necessary for a career as a data engineer.
172+
173+
> If you’re ready to join the next cohort of the course, submit this [form](https://airtable.com/appPPxkgYLH06Mvbw/shr7WtxHEPXxaui0Q){:target="_blank"} to register and stay updated.
174+
175+
### Theory and practice
176+
177+
We make LLMs theory accessible and engaging through real-world examples. We also demonstrate code directly in the lectures to show the implementation of concepts. This way, you can easily apply them in your projects.
178+
179+
For instance, in one of the lectures about a linear algebra refresher, the lecturer switches between screens. Firstly, they explain the concept of the dot product of two vectors, and then they demonstrate its implementation using Python.
180+
181+
<figure>
182+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image3.png" />
183+
<figcaption>Extract from the first lecture about Retrieval-Augmented Generation (RAG)</figcaption>
184+
</figure>
185+
186+
## Course assignments and scoring
187+
188+
### Homework and getting feedback
189+
190+
## <figure>
191+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image6.png" />
192+
<figcaption>Examples of the homework assignments from 2024 cohort of the LLM Zoomcamp</figcaption>
193+
</figure>
194+
195+
To reinforce your learning, we offer regular homework assignments. Your scores are added to a [leaderboard](https://courses.datatalks.club/llm-zoomcamp-2024/leaderboard){:target="_blank"}, creating friendly competition among course members and motivating you to do your best.
196+
197+
<figure>
198+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image5.png" />
199+
<figcaption>Example of the final leaderboard</figcaption>
200+
</figure>
201+
202+
For support, we have an [FAQ](https://docs.google.com/document/d/1m2KexowAXTmexfC5rVTCSnaShvdUQ8Ag2IEiwBDHxN0/edit?tab=t.0#heading=h.o29af0z8xx88){:target="_blank"} section with quick answers to common questions. If you need more help, [our Slack community](https://datatalks.club/slack.html){:target="_blank"} is always available for technical questions, clarifications, or guidance. Additionally, we host live Q&A sessions called "office hours" where you can interact with instructors and get immediate answers to your questions.
203+
204+
<figure>
205+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image4.png" />
206+
<figcaption>A screenshot of a FAQ document</figcaption>
207+
</figure>
208+
209+
210+
### Learning in public approach
211+
212+
A unique feature is our "learning in public" approach, inspired by [Shawn @swyx Wang](https://www.youtube.com/watch?v=tkBCPqWKCL8&list=PL7NIGf5_PlM-Dk3lgPsZFT94Ng7PpRQEh&index=5&t=195s){:target="_blank"}'s [article](https://www.swyx.io/learn-in-public){:target="_blank"}. We believe that everyone has something valuable to contribute, regardless of their expertise level.
213+
214+
<figure>
215+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image10.png" />
216+
<figcaption>An extract from the Shawn @swyx Wang's article about learning in public</figcaption>
217+
</figure>
218+
219+
Throughout the course, we actively encourage and incentivize learning in public. By sharing your progress, insights, and projects online, you earn additional points for your homework and projects.
220+
221+
Sharing your work online also helps you get noticed by social media algorithms, reaching a broader audience and creating opportunities to connect with individuals and organizations you may not have encountered otherwise.
222+
223+
### Course projects for your portfolio
224+
225+
If you've ever participated in an interview or conducted online research, you likely understand the significance of personal projects. To receive a certificate, you’ll need to finalize and submit an [end-to-end RAG application](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/project.md){:target="_blank"}. It allows you to choose a problem that interests you, find a suitable dataset, and develop your model.
226+
227+
<figure>
228+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image8.png" />
229+
<figcaption><a href="https://github.com/Optimistix/medical_QandA_assistant">Example project</a> from <a href="https://courses.datatalks.club/llm-zoomcamp-2024/leaderboard/3623/">Rileen Sinha</a>, one of the students of the course</figcaption>
230+
</figure>
231+
232+
## DataTalks.Club community
233+
234+
DataTalks.Club has a supportive community of like-minded individuals in [our Slack](https://datatalks.club/slack.html){:target="_blank"}. It is the perfect place to enhance your skills, deepen your knowledge, and connect with peers who share your passion. These connections can lead to lasting friendships, potential collaborations in future projects, and exciting career prospects.
235+
236+
<figure>
237+
<img src="/images/posts/2024-11-11-llm-zoomcamp/image11.png" />
238+
<figcaption>Course channel in <a href="https://datatalks.club/slack.html">our Slack community</a></figcaption>
239+
</figure>
240+
241+
## Conclusion
242+
243+
LLM Zoomcamp is a structured and practical introduction to applying Large Language Models in real-world contexts. Over the course of 10 weeks, you gain hands-on experience, from setting up retrieval systems to building a complete RAG application.
244+
245+
Each module is crafted to build useful skills step-by-step, ensuring you can put what you learn into practice. If you’re interested in learning about and applying LLMs, joining the next cohort is a good way to start.
246+
247+
> Register for the next LLM Zoomcamp cohort and stay updated on start dates by filling out this [form](https://airtable.com/appPPxkgYLH06Mvbw/shr7WtxHEPXxaui0Q){:target="_blank"}.
30.1 KB
Loading
772 KB
Loading
342 KB
Loading
276 KB
Loading
542 KB
Loading
107 KB
Loading
384 KB
Loading
105 KB
Loading
55.1 KB
Loading

0 commit comments

Comments
 (0)