MicrosoftDocs
diff --git a/‎learn-pr/paths/purview-implement-retention/index.yml
Lines changed: 2 additions & 2 deletions b/‎learn-pr/paths/purview-implement-retention/index.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎learn-pr/philanthropies/get-started-with-ai/index.yml
Lines changed: 2 additions & 2 deletions b/‎learn-pr/philanthropies/get-started-with-ai/index.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/includes/3-language-models.md
Lines changed: 21 additions & 33 deletions b/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/includes/3-language-models.md
Lines changed: 21 additions & 33 deletions
diff --git a/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/media/tokenization-pipeline.png
28.1 KB b/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/media/tokenization-pipeline.png
28.1 KB
diff --git a/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/media/word-embeddings.png
54 KB b/‎learn-pr/wwl-data-ai/fundamentals-generative-ai/media/word-embeddings.png
54 KB
diff --git a/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/3-collect-sysmon-event-logs.yml
Lines changed: 2 additions & 2 deletions b/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/3-collect-sysmon-event-logs.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/includes/3-collect-sysmon-event-logs.md
Lines changed: 56 additions & 12 deletions b/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/includes/3-collect-sysmon-event-logs.md
Lines changed: 56 additions & 12 deletions
diff --git a/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/index.yml
Lines changed: 3 additions & 3 deletions b/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/index.yml
Lines changed: 3 additions & 3 deletions
diff --git a/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/media/sysmon-log-location.png
78.7 KB b/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/media/sysmon-log-location.png
78.7 KB
diff --git a/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/media/windows-forwarded-events.png
296 KB b/‎learn-pr/wwl-sci/connect-windows-hosts-to-azure-sentinel/media/windows-forwarded-events.png
296 KB
@@ -3,7 +3,7 @@ uid: learn.wwl.purview-implement-retention
 metadata:
   title: 'Implement and manage retention in Microsoft Purview (SC-401)'
   description: 'Retention is key to meeting compliance requirements and managing the lifecycle of organizational data. Microsoft Purview enables organizations to apply retention policies and labels that preserve or delete data based on business, legal, or regulatory needs. This learning path aligns with exam SC-401: Microsoft Information Security Administrator.'
-  ms.date: 03/25/2025
+  ms.date: 04/11/2025
   author: wwlpublish
   ms.author: riswinto
   ms.topic: learning-path
@@ -25,7 +25,7 @@ products:
 subjects:
 - security
 modules:
-- learn-m365.m365-compliance-information-governance
+- learn.wwl.purview-understand-retention
 - learn.wwl.purview-implement-manage-retention
 trophy:
   uid: learn.wwl.purview-implement-retention.trophy
@@ -17,8 +17,8 @@ summary: This module covers basic AI terms and prompting best practices. You'll
 abstract: |
   In this module you'll learn to:
   - Define basic AI terms
-  - Demonstrate prompting best Practices when interacting with AI tools
-  - Summarize articles Using AI tools
+  - Demonstrate prompting best practices when interacting with AI tools
+  - Summarize articles using AI tools
   - Create images using AI tools
 prerequisites: None
 products:
 
@@ -1,59 +1,47 @@
-Over the last decades, multiple developments in the field of **natural language processing** (**NLP**) have resulted in achieving **large language models** (**LLMs**). The development and availability of language models led to new ways to interact with applications and systems, such as through generative AI assistants and agents. There are a few key concepts to understand about modern language models: 
+Over the last decades, multiple developments in the field of **natural language processing** (**NLP**) have resulted in achieving **large language models** (**LLMs**). The development and availability of language models led to new ways to interact with applications and systems, such as through generative AI assistants and agents.
 
-- How they *read*  
-- How they understand the relationship between words 
-- How they remember what was said  
+Let's take a look back at historical developments for language models which include:
 
-## Understanding tokenization
+- **Tokenization**: enabling machines to *read*.
+- **Word embeddings**: enabling machines to capture the relationship between words. 
+- **Architectural developments**: (changes in the design of language models) enabling them to capture word context.  
+
+## Tokenization
 
 As you may expect, machines have a hard time deciphering text as they mostly rely on numbers. To *read* text, we therefore need to convert the presented text to numbers.
 
 One important development to allow machines to more easily work with text has been tokenization. **Tokens** are strings with a known meaning, usually representing a word. **Tokenization** is turning words into tokens, which are then converted to numbers. A statistical approach to tokenization is by using a pipeline:
 
-:::image type="content" source="../media/tokenization-pipeline.gif" alt-text="Animation showing the pipeline of tokenization of a sentence.":::
+:::image type="content" source="../media/tokenization-pipeline.png" alt-text="A screenshot showing the pipeline of tokenization of a sentence.":::
 
 1. Start with the text you want to **tokenize**.
 1. **Split** the words in the text based on a rule. For example, split the words where there's a white space.
-1. **Stemming**. Merge similar words by removing the end of a word.
 1. **Stop word removal**. Remove noisy words that have little meaning like `the` and `a`. A dictionary of these words is provided to structurally remove them from the text.
 1. **Assign a number** to each unique token.
 
 Tokenization allowed for text to be labeled. As a result, statistical techniques could be used to let computers find patterns in the data instead of applying rule-based models.
 
-## Understand word embeddings
-
-One of the key concepts introduced by applying deep learning techniques to NLP is **word embeddings**. Word embeddings solved the problem of not being able to define the semantic relationship between words.
-
-Before word embeddings, a prevailing challenge with NLP was to detect the semantic relationship between words. Word embeddings represent words in a vector space, so that the relationship between words can be easily described and calculated.
-
-Word embeddings are created during **self-supervised learning**. During the training process, the model analyzes the cooccurrence patterns of words in sentences and learns to represent them as **vectors**. The vectors represent the words with coordinates in a multidimensional space. The distance between words can then be calculated by determining the distance between the relative vectors, describing the semantic relationship between words.
-
-Imagine you train a model on a large corpus of text data. During the training process, the model finds that the words `bike` and `car` are often used in the same patterns of words. Next to finding `bike` and `car` in the same text, you can also find each of them to be used when describing similar things. For example, someone may drive a `bike` or a `car`, or buy a `bike` or a `car` at a shop.
-
-The model learns that the two words are often found in similar contexts and therefore plots the word vectors for `bike` and `car` close to each other in the vector space.
-
-Imagine we have a three-dimensional vector space where each dimension corresponds to a semantic feature. In this case, let's say the dimensions represent factors like *vehicle type*, *mode of transportation*, and *activity*. We can then assign hypothetical vectors to the words based on their semantic relationships:
+## Word embeddings
 
-:::image type="content" source="../media/word-embeddings-vectors.png" alt-text="Diagram showing word embeddings for bike and car in a vector space, compared to drive and shop.":::
+One of the key concepts introduced by applying deep learning techniques to NLP is **word embeddings**. Word embeddings address the problem of not being able to define the **semantic relationship** between words.
 
-1. `Boat` [2, 1, 4] is close to `drive` and `shop`, reflecting that you can drive a boat and visit shops near bodies of water.
-1. `Car` [7, 5, 1] closer to `bike` than `boat` as cars and bikes are both used on land rather than on water.
-1. `Bike` [6, 8, 0] is closer to `drive` in the *activity* dimension and close to `car` in the *vehicle type* dimension.
-1. `Drive` [8, 4, 3] is close to `boat`, `car` and `bike`, but far from `shop` as it describes a different kind of activity.
-1. `Shop` [1, 3, 5] is closest to `bike` as these words are most commonly used together.
+Word embeddings are created during the deep learning model training process. During training, the model analyzes the cooccurrence patterns of words in sentences and learns to represent them as **vectors**. A vector represents a path through a point in n-dimensional space (in other words, a line). Semantic relationships are defined by how similar the angles of the lines are (i.e. the direction of the path). Because word embeddings represent words in a vector space, the relationship between words can be easily described and calculated.
 
-> [!Note]
-> In the example, a three-dimensional plane is used to describe word embeddings and vector spaces in simple terms. Vector spaces are often multidimensional planes with vectors representing a position in that space, similar to coordinates in a two-dimensional plane.
+To create a vocabulary that encapsulates semantic relationships between the tokens, we define contextual vectors, known as embeddings, for them. Vectors are multi-valued numeric representations of information, for example [10, 3, 1] in which each numeric element represents a particular attribute of the information. For language tokens, each element of a token's vector represents some semantic attribute of the token. The specific categories for the elements of the vectors in a language model are determined during training based on how commonly words are used together or in similar contexts.
+ 
+Vectors represent lines in multidimensional space, describing direction and distance along multiple axes (you can impress your mathematician friends by calling these amplitude and magnitude). Overall, the vector describes the direction and distance of the path from origin to end.
 
-Though word embeddings are a great approach to detecting the semantic relationship between words, it still has its problems. For example, words with different intents like `love` and `hate` often appear related because they're used in similar context. Another problem was that the model would only use one entry per word, resulting in a word with different meanings like `bank` to be semantically related to a wild array of words.
+:::image type="content" source="../media/word-embeddings.png" alt-text="A screenshot showing a simple example of word embeddings.":::
+ 
+The elements of the tokens in the embeddings space each represent some semantic attribute of the token, so that semantically similar tokens should result in vectors that have a similar orientation – in other words they point in the same direction. a technique called cosine similarity is used to determine if two vectors have similar directions (regardless of distance), and therefore represent semantically linked words. For example, the embedding vectors for "dog" and "puppy" describe a path along an almost identical direction, which is also fairly similar to the direction for "cat". The embedding vector for "skateboard" however describes journey in a very different direction.
 
-## Understand the need for context 
+## Architectural developments 
 
-To understand text isn't just to understand individual words, presented in isolation. Words can differ in their meaning depending on the **context** they're presented in. In other words, the sentence around a word matters to the meaning of the word.
+The architecture, or design, of a machine learning model describes the structure and organization of its various components and processes. It defines how data is processed, how models are trained and evaluated, and how predictions are generated. One of the first breakthroughs in language model architecture was the **Recurrent Neural Networks** (**RNNs**). 
 
-Before deep learning, including the context of a word was a task too complex and costly. One of the first breakthroughs in including the context were **Recurrent Neural Networks** (**RNNs**).
+To understand text isn't just to understand individual words, presented in isolation. Words can differ in their meaning depending on the **context** they're presented in. In other words, the sentence around a word matters to the meaning of the word. 
 
-RNNs consist of multiple sequential steps. Each step takes an **input** and a **hidden state**. Imagine the input at each step to be a new word. Each step also produces an **output**. The hidden state can serve as a memory of the network, storing the output of the previous step and passing it as input to the next step.
+RNNs are able to take into account the context of words through multiple sequential steps. Each step takes an **input** and a **hidden state**. Imagine the input at each step to be a new word. Each step also produces an **output**. The hidden state can serve as a memory of the network, storing the output of the previous step and passing it as input to the next step. 
 
 Imagine a sentence like:
 
 
@@ -4,8 +4,8 @@ title: Collect Sysmon event logs
 metadata:
   title: Collect Sysmon event logs
   description: "Collect Sysmon event logs"
-  ms.date: 08/16/2022
-  author: wwlpublish
+  ms.date: 04/10/2025
+  author: KenMAG
   ms.author: kelawson
   ms.topic: unit
 azureSandbox: false
 
@@ -1,25 +1,69 @@
 System Monitor (Sysmon) is a Windows system service, and device driver that remains resident across system reboots to monitor and log system activity to the Windows event log once installed on a system. It provides detailed information about process creations, network connections, and changes to file creation time. By collecting the events it generates using Windows Event Collection or SIEM agents and then analyzing them, you can identify malicious or anomalous activity and understand how intruders and malware operate on your network.
 
->[!Note]
->Installing and configuring Sysmon is out of the scope of this training.  Because Sysmon is a telemetry tool that many organizations use, it's essential to know how to configure the Log Analytics Agent and Workspace to collect the Sysmon events.
+ > [!NOTE]
+ > Installing and configuring Sysmon is out of the scope of this training. For more information on Sysmon, see [Sysinternals Sysmon](/sysinternals/downloads/sysmon).
 
-After connecting the Sysmon agent to the windows machine, perform the following to enable Microsoft Sentinel to query the logs:
+After connecting the Sysmon agent to the windows machine, you install the *Windows Forwarded Events* Content hub solution which includes the *Windows Forwarded Events* data connector. The data connector allows you to stream all Windows Event Forwarding (WEF) logs from the Windows Servers connected to your Microsoft Sentinel workspace using Azure Monitor Agent (AMA). In the data connector configuration, you create *Data collection rules* (DCRs) to collect metrics and logs from the client operating system. Perform the following steps to create a DCR and enable Microsoft Sentinel to query the logs:
 
-1. Go to your Azure portal.
+## Install the solution
 
-1. Select **Log Analytics workspaces** from Azure services.
+Start by installing the solution that contains the data connector.
 
-1. Select your Log Analytics workspace for Sentinel.
+1. For Microsoft Sentinel in the Azure portal, under **Content management**, select **Content hub**. For Microsoft Sentinel in the Defender portal, select **Microsoft Sentinel** > **Content management** > **Content hub**.
 
-1. In the Settings area, select **Legacy agents management**.
+1. Search for and select **Windows Forwarded Events**.
 
-1. On, the Windows event logs tab select **+ Add windows event log**.
+1. On the details pane, select **Install**.
 
-1. In the **Add windows event log** search box, enter: *Microsoft-Windows-Sysmon/Operational*.  Sysmon isn't in the list by default.
+## Configure the data connector
 
-1. Then select the **Apply** button
+After the solution is installed, connect the data connector.
 
-This connection can also be made from within Sentinel under **Settings > Workspace settings > Legacy agents management**.  Once configured, the Sysmon events will be available in the Event table.  
+1. In the Microsoft Sentinel navigation menu expand **Configuration**,  and select **Data connectors**.
 
-:::image type="content" source="../media/sysmon.png" alt-text="Screenshot of Log Analytics Sysmon configuration." lightbox="../media/sysmon.png":::
+1. Select the **Windows Forwarded Events** Data connector.
 
+1. Select **+Create data collections rule**.
+
+    :::image type="content" source="../media/windows-forwarded-events.png" lightbox="../media/windows-forwarded-events.png" alt-text="Screenshot that shows the Basics tab for a new data collection rule.":::
+
+1. Fill in the following fields of the *Basic* tab:
+
+    | Setting | Description |
+    |:---|:---|
+    | **Rule Name** | A name for the DCR. The name should be something descriptive that helps you identify the rule. |
+    | **Subscription** | The subscription to store the DCR. The subscription doesn't need to be the same subscription as the virtual machines. |
+    | **Resource group** | A resource group to store the DCR. The resource group doesn't need to be the same resource group as the virtual machines. |
+
+1. Select **Next:Resources >**.
+
+1. In the *Resources* tab, expand the **Scope** column, and expand the Microsoft Azure subscription.
+
+1. Expand the resource group or groups, and select the virtual machines you want to connect to Microsoft Sentinel.
+
+1. Select the **Next: Collect >** button, and select **Custom** radio button.
+
+1. As an example, you can enter the following events log location (XPath format) to collect Sysmon events:
+
+     ```xml
+     Microsoft-Windows-Sysmon/Operational!*
+     ```
+
+1. Select the **Add** button to add the Sysmon events log location.
+
+1. Select the **Next: Review + create >** button, after validation passes, select **Create**.
+
+    :::image type="content" source="../media/sysmon-log-location.png" alt-text="Screenshot of Log Analytics Sysmon configuration." lightbox="../media/sysmon-log-location.png":::
+
+     > [!NOTE]
+     > At the end of this process, the Azure Monitor Agent is installed on any selected machines that don't already have the agent.
+
+1. After the DCR is created, select the **Refresh** button to see the rule. You can also edit or delete existing rules from the **Configuration** section of the connector page.
+
+This connector can use the Advanced Security Information Model (ASIM). Microsoft recommends that you use the ASIM normalization. For more information on ASIM, see [Advanced Security Information Model (ASIM)](/azure/sentinel/normalization).
+
+1. On the **Windows Forwarded Events** connector page, **Configuration** section, select the **Deploy** button.
+
+1. Fill-in the required fields of the **Custom deployment** ARM template, and select **Review + create**.
+
+1. When validation passes, select **Create**.
@@ -3,18 +3,18 @@ uid: learn.wwl.connect-windows-hosts-to-azure-sentinel
 metadata:
   title: Connect Windows hosts to Microsoft Sentinel
   description: "Connect Windows hosts to Microsoft Sentinel"
-  ms.date: 04/09/2025
+  ms.date: 04/11/2025
   author: KenMAG
   ms.author: kelawson
   ms.topic: module
   ms.service: microsoft-sentinel
 title: Connect Windows hosts to Microsoft Sentinel
-summary: One of the most common logs to collect is Windows security events.  Learn how Microsoft Sentinel makes this easy with the Security Events connector.
+summary: Two of the most common logs to collect are Windows security events and Sysmon. Learn how Microsoft Sentinel makes this easy with the Microsoft Windows Events data connectors.
 abstract: |
   Upon completion of this module, the learner is able to:
   - Connect Azure Windows Virtual Machines to Microsoft Sentinel
   - Connect non-Azure Windows hosts to Microsoft Sentinel
-  - Configure Log Analytics agent to collect Sysmon events
+  - Install and configure a data connector to collect Sysmon events
 prerequisites: |
   Basic knowledge of operational concepts such as monitoring, logging, and alerting.
 iconUrl: /training/achievements/connect-windows-hosts-to-azure-sentinel.svg