Interim checkin

theresa-i · theresa-i · commit 8e2cab094675 · 2025-07-04T16:45:03.000-04:00
diff --git a/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/03-what-are-large-language-models.md b/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/03-what-are-large-language-models.md
@@ -18,15 +18,13 @@ When you want to achieve Generative AI, you can use LLMs to generate new content
 
 ## Understand the LLM architecture
 
-The architecture of LLMs typically involves **transformer networks**, which is a type of neural network introduced by the [*Attention is all you need* paper by Vaswani, et al. from 2017](https://arxiv.org/abs/1706.03762?azure-portal=true).
+To understand how LLMs work, we need to start with neural networks - computer systems inspired by how the human brain processes information, with interconnected nodes that learn patterns from data. LLMs use a specific type of neural network called transformers, which have revolutionized how AI understands language.
 
-Transformers use **self-attention mechanisms** to weigh the importance of different words in a sentence, allowing the model to understand context more effectively than previous models like **recurrent neural networks** (**RNNs**) or **long short-term memory** (**LSTM**) **networks**.
+**Transformers** are the architectural foundation of modern LLMs, designed specifically to process and understand text. Unlike older neural network approaches that had to read text word-by-word in sequence, transformers can analyze all words in a sentence simultaneously and determine how they relate to each other.
 
-This architectural breakthrough greatly improved LLMs, making them better at handling long-range dependencies and understanding the overall structure of the text.
+The breakthrough innovation in LLM architecture is the **self-attention mechanism** - it allows the model to focus on the most relevant words when understanding any part of the text. For example, in "The dog that was barking loudly woke up the neighbors," the transformer architecture enables the LLM to instantly connect "barking" and "loudly" to "dog," even though they're separated by other words.
 
-Training language models requires substantial computational resources and large-scale datasets. The datasets often include a diverse range of texts from books, websites, articles, and other written materials.
-
-During training, the model learns to predict the next word in a sentence, given the preceding words, which help it understand context and develop language comprehension. The sheer size of these models, often consisting of billions of parameters, allows them to store a vast amount of linguistic knowledge. For instance, GPT-3, one of the most well-known LLMs, has 175 billion parameters, making it one of the largest AI models ever created.
+This **transformer architecture** is what makes LLMs so powerful at understanding context and generating coherent text. They have a parallel processing capability that allows LLMs to handle long documents effectively and maintain understanding across entire conversations or articles, which was impossible with previous neural network designs. This architectural foundation, combined with massive training datasets and billions of parameters, creates the sophisticated language understanding we see in modern LLMs.
 
 ## Explore LLM applications
 
diff --git a/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/04-key-components-llms.md b/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/04-key-components-llms.md
@@ -1,15 +1,21 @@
-**Large Language Models** (**LLMs**) are designed to understand and generate human language, and their effectiveness hinges on four key components: 
+**Large Language Models** (**LLMs**) are like sophisticated language processing systems designed to understand and generate human language. Think of them as having four essential parts that work together, similar to how a car needs an engine, fuel system, transmission, and steering wheel to function properly.
 
-- **Tasks**: The diverse language-related functions these models can perform, such as text classification, translation, and dialogue generation.
-- **Tokenizer**: Preprocess text by breaking it down into manageable units, allowing the model to handle language efficiently.
-- **Model**: Typically based on transformer architecture, utilizes self-attention mechanisms to process text and generate contextually relevant responses.
-- **Prompt**: The inputs provided to the model, guiding it to produce the desired output.
+- **Prompt**: Your instructions to the model. The prompt is how you communicate with the LLM. It's your question, request or instruction.
+- **Tokenizer**: Breaks down language. The tokenizer is a language translator that converts human text into a format the computer can understand.
+- **Model**: The 'brain' of the operation. The model is the actual 'brain' that processes information and generates responses. It is typically based on the transformer architecture, utilizes self-attention mechanisms to process text and generates contextually relevant responses.
+- **Tasks**: What LLMs can do. Tasks are the different language-related jobs that LLMs can perform, such as text classification, translation, and dialogue generation.
 
-Together, these components enable LLMs to perform a wide array of language tasks with high accuracy and fluency.
+These components create a powerful language processing system:
+1. **You provide a prompt** (your instruction)
+2. **The tokenizer breaks it down** (makes it computer-readable)
+3. **The model processes it** (using transformer architecture and self-attention)
+4. **The model performs the task** (generates the response you need)
+
+This coordinated system is what enables LLMs to perform complex language tasks with remarkable accuracy and fluency, making them useful for everything from writing assistance to customer service to creative content generation.
 
 ## Understand the tasks LLMs perform
 
-LLMs are designed to perform a wide range of language-related tasks. LLMs are ideal for **natural language processing**, or **NLP** (1), tasks, because of their deep understanding of text and context.
+LLMs are designed to perform a wide range of language-related tasks. LLMs are ideal for **natural language processing**, or **NLP** (1), tasks, because of their deep understanding of text and context. Natural Language Processing (NLP) is the field of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language in a way that is meaningful and useful. In the context of LLM tasks, NLP represents the category of language-related functions that LLMs models excel at because of their deep understanding of text and context.
 
 :::image type="content" source="../media/natural-language-processing.png" alt-text="Screenshot of the model catalog in the Azure AI Studio.":::
 
@@ -21,7 +27,13 @@ LLMs are also used in dialogue systems and **conversational** agents, where they
 
 ## Understand the importance of the tokenizer
 
-**Tokenization** is a vital preprocessing step in LLMs, where text is broken down into manageable units called **tokens**. These tokens can be words, subwords, or even individual characters, depending on the tokenization strategy employed.
+**Tokenization** is a vital preprocessing step in LLMs. It converts human text into a format a computer can understand. The text is broken down into manageable units called **tokens**. These tokens can be words, subwords, or even individual characters, depending on the tokenization strategy employed.
+
+The tokenization process can be summarized like this:
+1. **Break text into tokens:** "Hello world" might become ["Hello", "world"] or even ["Hel", "lo", "wor", "ld"]
+2. **Handle different languages:** Processes English, Spanish, Chinese, etc.
+3. **Make processing efficient:** Smaller pieces are easier for the model to work with
+4. **Convert to numbers:** Computers work with numbers, not letters, so "Hello" becomes something like [7592, 1917]
 
 Modern tokenizers, such as **Byte Pair Encoding** (**BPE**) and **WordPiece**, split rare or unknown words into subword units, allowing the model to handle out-of-vocabulary terms more effectively.
 
@@ -53,31 +65,45 @@ Tokenization also enables the model to convert text into numerical formats that
 
 ## Understand the underlying model architecture
 
-The model architecture in LLMs is typically based on the **transformer** model, which utilizes **self-attention** mechanisms to process and understand text.
+Think of an LLM's architecture like the blueprint of a house - it shows how all the parts are organized and work together to create something functional. 
+
+LLMs are built using something called the **transformer architecture.** Imagine you're reading a book and you need to understand how different sentences relate to each other. The traditional approach is to read word by word, left to right, like reading normally. In the transformer approach, you could look at the entire page at once and instantly see how all the words connect to each other.
+
+**Self-attention** is a key innovation used in the transformer architecture. It's like having a super-smart highlighter that automatically marks the most important words for understanding each sentence.
+
+For example: In the sentence "The dog chased the ball because it was excited," self-attention helps the model know that "it" refers to "the dog" (not the ball), even though "dog" appears earlier in the sentence.
 
 Transformers consist of layers of **encoders** and **decoders** that work together to analyze input text and generate outputs. The self-attention mechanism allows the model to *weigh the importance of different words* in a sentence, enabling it to capture long-range dependencies and context effectively.
 
 :::image type="content" source="../media/transformer-model.png" alt-text="Diagram of transformer model architecture with the encoder and decoder blocks.":::
 
-1. The **model** is trained on a large volume of natural language text.
-2. The training data is broken down into **tokens** and the **encoder** block processes token sequences using **attention** to determine *relationships between tokens*.
-3. The output from the encoder is a collection of **vectors** (multi-valued numeric arrays) in which each element of the vector represents a semantic attribute of the tokens. These vectors are referred to as **embeddings**.
-4. The **decoder** block works on a new sequence of text tokens and uses the embeddings generated by the encoder to generate an appropriate natural language output.
-5. For example, given an input sequence like `When my dog was` the model can use the attention mechanism to analyze the input tokens and the semantic attributes encoded in the embeddings to predict an appropriate completion of the sentence, such as `a puppy`.
+Let's use this diagram as an example of how LLM processing works.
+
+The **LLM** is trained on a large volume of natural language text.
+**Step1: Input** Training documents and a prompt "When my dog was..." enter the system.
+**Step 2: Encoder (The analyzer)** Breaks text into **tokens** and analyzes its meaning. The **encoder** block processes token sequences using **self-attention** to determine the relationships between tokens or words.
+**Step 3: Embeddings are created** The output from the encoder is a collection of **vectors** (multi-valued numeric arrays) in which each element of the vector represents a semantic attribute of the tokens. These vectors are referred to as **embeddings**. They are numerical representations that capture meaning:
+
+- **dog [10,3,2]** - animal, pet, subject
+- **cat [10,3,1]** - animal, pet, different species
+- **puppy [5,2,1]** - young animal, related to dog
+- **skateboard [-3,3,2]** - object, unrelated to animals
+
+**Step 4: Decoder (The writer)** block works on a new sequence of text tokens and uses the embeddings generated by the encoder to generate an appropriate natural language output. It compares the options and chooses the most appropriate response.
+**Step 5: Output generated** Given an input sequence like `When my dog was`, the model can use the self-attention mechanism to analyze the input tokens and the semantic attributes encoded in the embeddings to predict an appropriate completion of the sentence, such as `a puppy`.
 
-This architecture is highly parallelizable, making it efficient for training on large datasets. The size of the model, often defined by the number of parameters, determines its capacity to store linguistic knowledge and perform complex tasks. Large models, such as GPT-3 and GPT-4, contain billions of parameters, which contribute to their high performance and versatility.
+This architecture is highly parallelizable, making it efficient for training on large datasets. The size of the LLM, often defined by the number of parameters, determines its capacity to store linguistic knowledge and perform complex tasks. Think of parameters as millions or billions of tiny memory cells that store language rules and patterns. More memory cells mean the model can remember more about language and handle harder tasks. Large models, such as GPT-3 and GPT-4, contain billions of parameters, allowing them to store vast language knowledge.
 
 ## Understand the importance of the prompt
 
-**Prompts** are the initial inputs given to LLMs to guide their responses.
+**Prompts** are the initial inputs given to LLMs to guide their responses. They’re the conductor that makes all four LLM components (prompt, tokenizer, model, output) work together effectively. The quality and clarity of the prompt significantly influence the model’s performance, and a well-structured prompt can lead to more accurate and relevant responses.
 
 Crafting effective prompts is crucial for obtaining the desired output from the model. Prompts can range from simple instructions to complex queries, and the model generates text based on the context and information provided in the prompt.
 
 For example, a prompt can be:
 
 `Translate the following English text to French: "Hello, how are you?"`
 
-The quality and clarity of the prompt significantly influence the model’s performance, as a well-structured prompt can lead to more accurate and relevant responses.
 
 In addition to standard prompts, techniques such as **prompt engineering** involve refining and optimizing prompts to enhance the model’s output for specific tasks or applications.
 
diff --git a/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/05-use-llms.md b/learn-pr/wwl-data-ai/introduction-language-models-databricks/includes/05-use-llms.md
@@ -81,14 +81,14 @@ For each tweet, describe its sentiment.
 Tweet: I hate it when my phone battery dies
 Sentiment: Negative
 
-Tweet: My has been great
+Tweet: My day has been great
 Sentiment: Positive
 
 Tweet: This is the ink to the article
 Sentiment: Neutral
 
-Tweet: This new music video was incredible
-Sentiment: 
+Tweet: This new music video is incredible
+Sentiment: Positive
 ```
 
 The LLM uses the examples to understand what it needs to do and completes the prompt by returning the sentiment of the last tweet.
diff --git a/learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/1-introduction.md b/learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/1-introduction.md
@@ -1,15 +1,19 @@
-Language models are growing in popularity as they create impressive coherent answers to a user’s questions. Especially when a user interacts with a language model through chat, it provides an intuitive way to get the information they need.
+Retrieval Augmented Generation (RAG) is a technique in natural language processing that makes LLMs more effective by giving them access to external information. Instead of relying only on their training data like traditional models do, RAG allows LLMs to search through databases, documents, or websites to find relevant facts before creating a response. This combination of searching for information and then generating text helps produce more accurate and up-to-date answers.
 
-One prevalent challenge when implementing language models through chat is the so-called **groundedness**, which refers to whether a response is rooted, connected, or anchored in reality or a specific context. In other words, groundedness refers to whether the response of a language model is based on *factual information* to avoid *hallucinations* or incorrect information.
+Language models have become incredibly popular because they can generate impressive, well-structured answers to user questions. When people interact with these models through chat interfaces, it feels like a natural and intuitive way to get information.
 
-When you want a language model to learn specific knowledge, there are three main approaches:
+However, there's a major challenge: ensuring the AI's responses are actually accurate and factual. This challenge is called **groundedness** - which simply means whether the AI's answer is based on real, reliable information rather than made-up or incorrect details. Without proper groundedness, language models can "hallucinate" by confidently stating things that aren't true. Another challenge is that traditional models use only information they were trained on that can be outdated or incomplete.
+
+When you want a language model to have access to specific knowledge, you have three main options:
 
 :::image type="content" source="../media/learn-knowledge.png" alt-text="Diagram of three approaches for language models to learn knowledge.":::
 
-1. **Model pretraining**: Train a language model from scratch, which requires large datasets consisting of billions to trillions of tokens.
-2. **Model fine-tuning**: Adapt a pretrained language model to specific datasets or domains, which requires thousands of domain-specific or instruction examples.
-3. **Passing contextual information**: Combine a language model with external knowledge retrieval, which requires an external knowledge base.
+1. **Model pretraining**: Build a language model from the ground up, which requires massive datasets with billions or trillions of text pieces or tokens. This is extremely expensive and time-consuming.
+
+2. **Model fine-tuning**: Take an existing language model and train it further on your specific data or industry, which requires thousands of specialized examples. This is moderately expensive and complex.
+
+3. **Passing contextual information**: Connect a language model to external databases or documents so it can look up information in real-time. This is a strategy known as Retrieval Augmented Generation (RAG). This requires setting up a knowledge base but is much simpler than the other options.
 
-Passing contextual information can be achieved through a strategy known as **Retrieval Augmented Generation** (**RAG**). RAG is the least complex strategy, requires less compute, and ensures a language model is grounded on specific data to provide factually accurate responses.
+RAG is most practical when you need an AI with access to current, verifiable information. It's easier to implement and uses less computing power than retraining entire models.
 
-In this module, you learn where and how RAG can be used to improve the quality, reliability, and accuracy of language models. You explore how vectors are used to search and provide relevant context to language models.
+In this module, you'll learn when and how to use RAG to make language models more reliable and accurate. You'll also discover how vector search technology helps the AI quickly find the most relevant information to include in its responses.
diff --git a/learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/2-workflow.md b/learn-pr/wwl-data-ai/retrieval-augmented-generation-azure-databricks/includes/2-workflow.md