Update.

Sherry Yang · Sherry Yang · commit 3ba2cf3ddf14 · 2025-05-21T21:38:40.000-07:00
diff --git a/learn-pr/wwl-data-ai/introduction-language/2-how-it-works.yml b/learn-pr/wwl-data-ai/introduction-language/2-how-it-works.yml
@@ -1,15 +1,15 @@
 ### YamlMime:ModuleUnit
 uid: learn.wwl.introduction-language.how-it-works
-title: How it works
+title: General principles of NLP
 metadata:
-  title: How it works
-  description: "How it works"
+  title: General principles of NLP
+  description: "General principles of NLP"
   ms.date: 5/21/2025
   author: wwlpublish
   ms.author: sheryang
   ms.topic: unit
   ms.custom:
   - N/A
-durationInMinutes: 6
+durationInMinutes: 4
 content: |
   [!include[](includes/2-how-it-works.md)]
diff --git a/learn-pr/wwl-data-ai/introduction-language/3-semantic-models.yml b/learn-pr/wwl-data-ai/introduction-language/3-semantic-models.yml
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-language.semantic-models
+title: Understand semantic language models
+metadata:
+  title: Understand semantic language models
+  description: "Understand semantic language models"
+  ms.date: 5/21/2025
+  author: wwlpublish
+  ms.author: sheryang
+  ms.topic: unit
+  ms.custom:
+  - N/A
+durationInMinutes: 3
+content: |
+  [!include[](includes/3-semantic-models.md)]
diff --git a/learn-pr/wwl-data-ai/introduction-language/4-text-analysis.yml b/learn-pr/wwl-data-ai/introduction-language/4-text-analysis.yml
@@ -12,4 +12,4 @@ metadata:
   - N/A
 durationInMinutes: 4
 content: |
-  [!include[](includes/3-text-analysis.md)]
+  [!include[](includes/4-text-analysis.md)]
diff --git a/learn-pr/wwl-data-ai/introduction-language/5-knowledge-check.yml b/learn-pr/wwl-data-ai/introduction-language/5-knowledge-check.yml
diff --git a/learn-pr/wwl-data-ai/introduction-language/6-summary.yml b/learn-pr/wwl-data-ai/introduction-language/6-summary.yml
@@ -12,4 +12,4 @@ metadata:
   - N/A
 durationInMinutes: 1
 content: |
-  [!include[](includes/5-summary.md)]
+  [!include[](includes/6-summary.md)]
diff --git a/learn-pr/wwl-data-ai/introduction-language/includes/1-introduction.md b/learn-pr/wwl-data-ai/introduction-language/includes/1-introduction.md
@@ -6,5 +6,4 @@ Natural language processing might be used to create:
 - A document search application that summarizes documents in a catalog.
 - An application that extracts brands and company names from text.
 
-In this module, let's explore natural language processing. 
-
+Next, let's examine some general principles and common techniques used to perform text analysis and other NLP tasks. 
diff --git a/learn-pr/wwl-data-ai/introduction-language/includes/2-how-it-works.md b/learn-pr/wwl-data-ai/introduction-language/includes/2-how-it-works.md
@@ -1,5 +1,3 @@
-Let's examine some general principles and common techniques used to perform text analysis and other natural language processing (NLP) tasks.
-
 Some of the earliest techniques used to analyze text with computers involve statistical analysis of a body of text (a *corpus*) to infer some kind of semantic meaning. Put simply, if you can determine the most commonly used words in a given document, you can often get a good idea of what the document is about.
 
 ## Tokenization
@@ -49,37 +47,3 @@ For example, consider the following restaurant reviews, which are already labele
 
 With enough labeled reviews, you can train a classification model using the tokenized text as *features* and the sentiment (0 or 1) a *label*. The model will encapsulate a relationship between tokens and sentiment - for example, reviews with tokens for words like `"great"`, `"tasty"`, or `"fun"` are more likely to return a sentiment of **1** (*positive*), while reviews with words like `"terrible"`, `"slow"`, and `"substandard"` are more likely to return **0** (*negative*).
 
-## Semantic language models
-
-As the state of the art for NLP has advanced, the ability to train models that encapsulate the semantic relationship between tokens has led to the emergence of powerful language models. At the heart of these models is the encoding of language tokens as vectors (multi-valued arrays of numbers) known as *embeddings*.
-
-It can be useful to think of the elements in a token embedding vector as coordinates in multidimensional space, so that each token occupies a specific "location." The closer tokens are to one another along a particular dimension, the more semantically related they are. In other words, related words are grouped closer together. As a simple example, suppose the embeddings for our tokens consist of vectors with three elements, for example:
-
-```
-- 4 ("dog"): [10.3.2]
-- 5 ("bark"): [10,2,2]
-- 8 ("cat"): [10,3,1]
-- 9 ("meow"): [10,2,1]
-- 10 ("skateboard"): [3,3,1]
-```
-
-We can plot the location of tokens based on these vectors in three-dimensional space, like this:
-
-![A diagram of tokens plotted on a three-dimensional space.](../media/example-embeddings-graph.png)
-
-The locations of the tokens in the embeddings space include some information about how closely the tokens are related to one another. For example, the token for `"dog"` is close to `"cat"` and also to `"bark"`. The tokens for `"cat"` and `"bark"` are close to `"meow"`. The token for `"skateboard"` is further away from the other tokens.
-
-The language models we use in industry are based on these principles but have greater complexity. For example, the vectors used generally have many more dimensions. There are also multiple ways you can calculate appropriate embeddings for a given set of tokens. Different methods result in different predictions from natural language processing models.
-
-A generalized view of most modern natural language processing solutions is shown in the following diagram. A large corpus of raw text is tokenized and used to train language models, which can support many different types of natural language processing task.
-
-![A diagram of the process to tokenize text and train a language model that supports natural language processing tasks.](../media/language-model.png)
-
-Common NLP tasks supported by language models include:
-- Text analysis, such as extracting key terms or identifying named entities in text.
-- Sentiment analysis and opinion mining to categorize text as *positive* or *negative*.
-- Machine translation, in which text is automatically translated from one language to another.
-- Summarization, in which the main points of a large body of text are summarized.
-- Conversational AI solutions such as *bots* or *digital assistants* in which the language model can interpret natural language input and return an appropriate response.
-
-Next, let's learn more about the capabilities made possible by langauge models.
diff --git a/learn-pr/wwl-data-ai/introduction-language/includes/3-semantic-models.md b/learn-pr/wwl-data-ai/introduction-language/includes/3-semantic-models.md
@@ -0,0 +1,32 @@
+As the state of the art for NLP has advanced, the ability to train models that encapsulate the semantic relationship between tokens has led to the emergence of powerful language models. At the heart of these models is the encoding of language tokens as vectors (multi-valued arrays of numbers) known as *embeddings*.
+
+It can be useful to think of the elements in a token embedding vector as coordinates in multidimensional space, so that each token occupies a specific "location." The closer tokens are to one another along a particular dimension, the more semantically related they are. In other words, related words are grouped closer together. As a simple example, suppose the embeddings for our tokens consist of vectors with three elements, for example:
+
+```
+- 4 ("dog"): [10.3.2]
+- 5 ("bark"): [10,2,2]
+- 8 ("cat"): [10,3,1]
+- 9 ("meow"): [10,2,1]
+- 10 ("skateboard"): [3,3,1]
+```
+
+We can plot the location of tokens based on these vectors in three-dimensional space, like this:
+
+![A diagram of tokens plotted on a three-dimensional space.](../media/example-embeddings-graph.png)
+
+The locations of the tokens in the embeddings space include some information about how closely the tokens are related to one another. For example, the token for `"dog"` is close to `"cat"` and also to `"bark"`. The tokens for `"cat"` and `"bark"` are close to `"meow"`. The token for `"skateboard"` is further away from the other tokens.
+
+The language models we use in industry are based on these principles but have greater complexity. For example, the vectors used generally have many more dimensions. There are also multiple ways you can calculate appropriate embeddings for a given set of tokens. Different methods result in different predictions from natural language processing models.
+
+A generalized view of most modern natural language processing solutions is shown in the following diagram. A large corpus of raw text is tokenized and used to train language models, which can support many different types of natural language processing task.
+
+![A diagram of the process to tokenize text and train a language model that supports natural language processing tasks.](../media/language-model.png)
+
+Common NLP tasks supported by language models include:
+- Text analysis, such as extracting key terms or identifying named entities in text.
+- Sentiment analysis and opinion mining to categorize text as *positive* or *negative*.
+- Machine translation, in which text is automatically translated from one language to another.
+- Summarization, in which the main points of a large body of text are summarized.
+- Conversational AI solutions such as *bots* or *digital assistants* in which the language model can interpret natural language input and return an appropriate response.
+
+Next, let's learn more about the capabilities made possible by langauge models.
diff --git a/learn-pr/wwl-data-ai/introduction-language/includes/4-text-analysis.md b/learn-pr/wwl-data-ai/introduction-language/includes/4-text-analysis.md
diff --git a/learn-pr/wwl-data-ai/introduction-language/includes/6-summary.md b/learn-pr/wwl-data-ai/introduction-language/includes/6-summary.md
diff --git a/learn-pr/wwl-data-ai/introduction-language/index.yml b/learn-pr/wwl-data-ai/introduction-language/index.yml
@@ -32,6 +32,7 @@ subjects:
 units:
 - learn.wwl.introduction-language.introduction
 - learn.wwl.introduction-language.how-it-works
+- learn.wwl.introduction-language.semantic-models
 - learn.wwl.introduction-language.text-analysis
 - learn.wwl.introduction-language.knowledge-check
 - learn.wwl.introduction-language.summary