Skip to content

Commit eee321e

Browse files
committed
Updated summary and ms.date fields
1 parent 92db102 commit eee321e

File tree

15 files changed

+198
-191
lines changed

15 files changed

+198
-191
lines changed
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.introduction
3-
title: Introduction
4-
metadata:
5-
title: Introduction
6-
description: "Introduction"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 2
14-
content: |
15-
[!include[](includes/1-introduction.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Introduction"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 2
14+
content: |
15+
[!include[](includes/1-introduction.md)]
16+
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.workflow
3-
title: Explore the main concepts of a RAG workflow
4-
metadata:
5-
title: Explore the main concepts of a RAG workflow
6-
description: "Explore the main concepts of a RAG workflow"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 8
14-
content: |
15-
[!include[](includes/2-workflow.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.workflow
3+
title: Explore the main concepts of a RAG workflow
4+
metadata:
5+
title: Explore the main concepts of a RAG workflow
6+
description: "Explore the main concepts of a RAG workflow"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 8
14+
content: |
15+
[!include[](includes/2-workflow.md)]
16+
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.prepare-data
3-
title: Prepare your data for RAG
4-
metadata:
5-
title: Prepare your data for RAG
6-
description: "Prepare your data for RAG"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 7
14-
content: |
15-
[!include[](includes/3-prepare-data.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.prepare-data
3+
title: Prepare your data for RAG
4+
metadata:
5+
title: Prepare your data for RAG
6+
description: "Prepare your data for RAG"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 7
14+
content: |
15+
[!include[](includes/3-prepare-data.md)]
16+
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.vector-search
3-
title: Find relevant data with vector search
4-
metadata:
5-
title: Find relevant data with vector search
6-
description: "Find relevant data with vector search"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 6
14-
content: |
15-
[!include[](includes/4-vector-search.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.vector-search
3+
title: Find relevant data with vector search
4+
metadata:
5+
title: Find relevant data with vector search
6+
description: "Find relevant data with vector search"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 6
14+
content: |
15+
[!include[](includes/4-vector-search.md)]
16+
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.ranking
3-
title: Rerank your retrieved results
4-
metadata:
5-
title: Rerank your retrieved results
6-
description: "Rerank your retrieved results"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 5
14-
content: |
15-
[!include[](includes/5-ranking.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.ranking
3+
title: Rerank your retrieved results
4+
metadata:
5+
title: Rerank your retrieved results
6+
description: "Rerank your retrieved results"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 5
14+
content: |
15+
[!include[](includes/5-ranking.md)]
16+
Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.exercise
3-
title: Exercise - Set up RAG
4-
metadata:
5-
title: Exercise - Set up RAG
6-
description: "Exercise - Set up RAG"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 30
14-
content: |
15-
[!include[](includes/6-exercise.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.exercise
3+
title: Exercise - Set up RAG
4+
metadata:
5+
title: Exercise - Set up RAG
6+
description: "Exercise - Set up RAG"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 30
14+
content: |
15+
[!include[](includes/6-exercise.md)]
Lines changed: 49 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,49 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.knowledge-check
3-
title: Module assessment
4-
metadata:
5-
title: Module assessment
6-
description: "Knowledge check"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
module_assessment: true
12-
azureSandbox: false
13-
labModal: false
14-
durationInMinutes: 3
15-
quiz:
16-
questions:
17-
- content: "What is the primary advantage of using vector embeddings in Retrieval Augmented Generation (RAG) on Azure Databricks?"
18-
choices:
19-
- content: "Faster data ingestion."
20-
isCorrect: false
21-
explanation: "Incorrect. Faster data ingestion isn't the primary advantage of using vector embeddings in RAG on Azure Databricks."
22-
- content: "More relevant search results."
23-
isCorrect: true
24-
explanation: "Correct. Vector embeddings allow for more accurate and contextually relevant search results by capturing the semantic meaning of data. Vector embeddings enable the RAG system to retrieve information based on meaning rather than just keywords, which significantly enhance the quality of the responses generated by the model."
25-
- content: "Reduced storage requirements."
26-
isCorrect: false
27-
explanation: "Incorrect. Reduced storage requirements aren't the primary advantage of using vector embeddings in RAG on Azure Databricks."
28-
- content: "Which component of a RAG workflow in Azure Databricks is responsible for transforming user queries into a format suitable for the retrieval process?"
29-
choices:
30-
- content: "Data ingestion"
31-
isCorrect: false
32-
explanation: "Incorrect. Data ingestion isn't responsible for transforming user queries into a format suitable for retrieval process."
33-
- content: "Query vectorization"
34-
isCorrect: true
35-
explanation: "Correct. Query vectorization is the process of converting user queries into vector embeddings using the same embedding model that was used to create the data vectors. This step ensures that the query can be compared semantically with the stored data vectors to retrieve the most relevant information​."
36-
- content: "Embedding storage"
37-
isCorrect: false
38-
explanation: "Incorrect. Embedding storage isn't responsible for transforming user queries into a format suitable for retrieval process."
39-
- content: "In the context of improving RAG application quality on Azure Databricks, what does retrieval quality refer to?"
40-
choices:
41-
- content: "The efficiency of data storage solutions."
42-
isCorrect: false
43-
explanation: "Incorrect. The efficiency of data storage solutions doesn't relate to retrieval quality."
44-
- content: "The accuracy of the information retrieved for a given query."
45-
isCorrect: true
46-
explanation: "Correct. Retrieval quality pertains to how accurately the RAG system retrieves relevant information for a given query. High retrieval quality ensures that the context given to the LLM (Large Language Model) is relevant and complete, which is essential for accurate and coherent responses​."
47-
- content: "The speed at which the model generates responses."
48-
isCorrect: false
49-
explanation: "Incorrect. The speed at which the model generates responses doesn't relate to retrieval quality."
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Knowledge check"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
module_assessment: true
12+
azureSandbox: false
13+
labModal: false
14+
durationInMinutes: 3
15+
quiz:
16+
questions:
17+
- content: "What is the primary advantage of using vector embeddings in Retrieval Augmented Generation (RAG) on Azure Databricks?"
18+
choices:
19+
- content: "Faster data ingestion."
20+
isCorrect: false
21+
explanation: "Incorrect. Faster data ingestion isn't the primary advantage of using vector embeddings in RAG on Azure Databricks."
22+
- content: "More relevant search results."
23+
isCorrect: true
24+
explanation: "Correct. Vector embeddings allow for more accurate and contextually relevant search results by capturing the semantic meaning of data. Vector embeddings enable the RAG system to retrieve information based on meaning rather than just keywords, which significantly enhance the quality of the responses generated by the model."
25+
- content: "Reduced storage requirements."
26+
isCorrect: false
27+
explanation: "Incorrect. Reduced storage requirements aren't the primary advantage of using vector embeddings in RAG on Azure Databricks."
28+
- content: "Which component of a RAG workflow in Azure Databricks is responsible for transforming user queries into a format suitable for the retrieval process?"
29+
choices:
30+
- content: "Data ingestion"
31+
isCorrect: false
32+
explanation: "Incorrect. Data ingestion isn't responsible for transforming user queries into a format suitable for retrieval process."
33+
- content: "Query vectorization"
34+
isCorrect: true
35+
explanation: "Correct. Query vectorization is the process of converting user queries into vector embeddings using the same embedding model that was used to create the data vectors. This step ensures that the query can be compared semantically with the stored data vectors to retrieve the most relevant information​."
36+
- content: "Embedding storage"
37+
isCorrect: false
38+
explanation: "Incorrect. Embedding storage isn't responsible for transforming user queries into a format suitable for retrieval process."
39+
- content: "In the context of improving RAG application quality on Azure Databricks, what does retrieval quality refer to?"
40+
choices:
41+
- content: "The efficiency of data storage solutions."
42+
isCorrect: false
43+
explanation: "Incorrect. The efficiency of data storage solutions doesn't relate to retrieval quality."
44+
- content: "The accuracy of the information retrieved for a given query."
45+
isCorrect: true
46+
explanation: "Correct. Retrieval quality pertains to how accurately the RAG system retrieves relevant information for a given query. High retrieval quality ensures that the context given to the LLM (Large Language Model) is relevant and complete, which is essential for accurate and coherent responses​."
47+
- content: "The speed at which the model generates responses."
48+
isCorrect: false
49+
explanation: "Incorrect. The speed at which the model generates responses doesn't relate to retrieval quality."
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.summary
3-
title: Summary
4-
metadata:
5-
title: Summary
6-
description: "Summary"
7-
ms.date: 03/20/2025
8-
author: wwlpublish
9-
ms.author: theresai
10-
ms.topic: unit
11-
azureSandbox: false
12-
labModal: false
13-
durationInMinutes: 1
14-
content: |
15-
[!include[](includes/8-summary.md)]
16-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.retrieval-augmented-generation-azure-databricks.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary"
7+
ms.date: 07/07/2025
8+
author: theresa-i
9+
ms.author: theresai
10+
ms.topic: unit
11+
azureSandbox: false
12+
labModal: false
13+
durationInMinutes: 1
14+
content: |
15+
[!include[](includes/8-summary.md)]
16+

learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/1-introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Retrieval Augmented Generation (RAG) is a technique in natural language processi
22

33
Language models have become incredibly popular because they can generate impressive, well-structured answers to user questions. When people interact with these models through chat interfaces, it feels like a natural and intuitive way to get information.
44

5-
However, there's a major challenge: ensuring the AI's responses are actually accurate and factual. This challenge is called **groundedness** - which simply means whether the AI's answer is based on real, reliable information rather than made-up or incorrect details. Without proper groundedness, language models can "hallucinate" by confidently stating things that aren't true. Another challenge is that traditional models use only information they were trained on that can be outdated or incomplete.
5+
However, there's a major challenge: ensuring the AI's responses are accurate and factual. This challenge is called **groundedness** - which simply means whether the AI's answer is based on real, reliable information rather than made-up or incorrect details. Without proper groundedness, language models might confidently stating things that aren't true. Another challenge is that traditional models use only information they were trained on that can be outdated or incomplete.
66

77
When you want a language model to have access to specific knowledge, you have three main options:
88

learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/2-workflow.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
**Retrieval Augmented Generation (RAG)** is a technique that makes large language models more effective by connecting them to your own custom data. The RAG workflow follows a simple four-step process, as shown in the diagram below:
1+
**Retrieval Augmented Generation (RAG)** is a technique that makes large language models more effective by connecting them to your own custom data. The RAG workflow follows a simple four-step process, as shown in the diagram:
22

33
:::image type="content" source="../media/retrieval-augmented-retrieval.png" alt-text="Diagram of retrieval augmented generation workflow.":::
44

5-
1. **User Query**: A user asks a question that the base LLM alone cannot answer accurately because it doesn't have access to your specific documents, recent information, or proprietary data.
5+
1. **User Query**: A user asks a question that the base LLM alone can't answer accurately because it doesn't have access to your specific documents, recent information, or proprietary data.
66

77
2. **Search Your Database**: The system searches through your own document collection (company policies, reports, manuals, databases) - not the LLM's training data. Your documents were previously converted into embeddings and stored in a vector database. The system finds the most relevant information from your specific documents.
88

@@ -41,21 +41,21 @@ Before a RAG system can find relevant information, it needs to convert all text
4141

4242
An embedding model is a specialized AI tool that converts text into numerical vectors (lists of numbers) that represent the meaning of the text. Think of it as a translator that turns works and sentences into a mathematical language that computers can understand and compare.
4343

44-
Document embedding, as shown in the diagram below, is part of a preparation phase. This is done once to set up a knowledge base. Before your RAG system can work, you need to prepare your documents. An embedding model takes all your text documents and transforms them into mathematical vectors called embeddings, that capture their semantic meaning. This preprocessing step creates a searchable knowledge base.
44+
Document embedding, as shown in the diagram, is part of a preparation phase. This is done once to set up a knowledge base. Before your RAG system can work, you need to prepare your documents. An embedding model takes all your text documents and transforms them into mathematical vectors called embeddings, that capture their semantic meaning. This preprocessing step creates a searchable knowledge base.
4545

4646
:::image type="content" source="../media/document-embedding.png" alt-text="Diagram of embeddings model converting documents to vectors.":::
4747

48-
Query embedding, shown in the diagram below, happens each time a user asks a question. First, the user's question is converted into an embedding using the same embedding model. This real-time conversion prepares the query for comparison against your pre-processed document embeddings. Only after the query is embedded can the system begin searching for relevant documents.
48+
Query embedding, shown in the diagram, happens each time a user asks a question. First, the user's question is converted into an embedding using the same embedding model. This real-time conversion prepares the query for comparison against your preprocessed document embeddings. Only after the query is embedded can the system begin searching for relevant documents.
4949

5050
:::image type="content" source="../media/query-embedding.png" alt-text="Diagram of embeddings model.":::
5151

5252
Think of document embedding as building your searchable library, and query embedding as translating each question into the same format so you can find the right books in that library. The search only begins **after** the question has been translated.
5353

5454
### Store and search your embeddings with a vector store
5555

56-
Once you've converted your documents into embeddings, you need somewhere to store them that allows for fast semantic search. A regular database would struggle with this because it can't efficiently compare the mathematical similarity between vectors.
56+
Once you've converted your documents into embeddings, you need somewhere to store them that allows for fast semantic search. A regular database would struggle with this because it can't efficiently compare the mathematical similarity between vectors.
5757

58-
A vector store is a specialized database designed specifically for storing and searching through embeddings (those mathematical vectors created from your documents). Unlike traditional database that store text or numbers, vector stores are optimized for finding similar vectors quickly, even when dealing with millions of documents.
58+
A vector store is a specialized database designed specifically for storing and searching through embeddings (those mathematical vectors created from your documents). Unlike traditional databases that store text or numbers, vector stores are optimized for finding similar vectors quickly, even when dealing with millions of documents.
5959

6060
You can implement vector storage through **vector databases**, **vector libraries**, or **database plugins**.
6161

@@ -81,4 +81,4 @@ The complete RAG workflow combines all the components we've discussed into a uni
8181

8282
The key mechanism is **in-context learning** - instead of retraining the LLM, you provide relevant information as context in each prompt, allowing the LLM to generate informed responses without permanent modification.
8383

84-
Advanced implementations may include feedback loops to refine results when the initial response doesn't meet quality thresholds.
84+
Advanced implementations might include feedback loops to refine results when the initial response doesn't meet quality thresholds.

0 commit comments

Comments
 (0)