Merge pull request #48787 from ivorb/jan-bugfix

JamesJBarnett · web-flow · commit d073a688658e · 2025-01-21T14:22:25.000-07:00
update embedding content
diff --git a/learn-pr/wwl-data-ai/build-copilot-ai-studio/includes/3-search-data.md b/learn-pr/wwl-data-ai/build-copilot-ai-studio/includes/3-search-data.md
@@ -10,8 +10,6 @@ While a text-based index will improve search efficiency, you can usually achieve
 
 An embedding is a special format of data representation that a search engine can use to easily find the relevant information. More specifically, an embedding is a vector of floating-point numbers.
 
-> [!VIDEO https://play.vidyard.com/sq5CuXbmZzdpqWABdwVjug?loop=1]
-
 For example, imagine you have two documents with the following contents:
 
 - *"The children played joyfully in the park."*
diff --git a/learn-pr/wwl-data-ai/build-copilot-ai-studio/media/vector-embeddings.jpg b/learn-pr/wwl-data-ai/build-copilot-ai-studio/media/vector-embeddings.jpg
diff --git a/learn-pr/wwl-data-ai/fundamentals-generative-ai/includes/3-language-models.md b/learn-pr/wwl-data-ai/fundamentals-generative-ai/includes/3-language-models.md
@@ -66,8 +66,6 @@ With a sufficiently large set of training text, a vocabulary of many thousands o
 
 While it may be convenient to represent tokens as simple IDs - essentially creating an index for all the words in the vocabulary, they don't tell us anything about the meaning of the words, or the relationships between them. To create a vocabulary that encapsulates semantic relationships between the tokens, we define contextual vectors, known as *embeddings*, for them. Vectors are multi-valued numeric representations of information, for example [10, 3, 1] in which each numeric element represents a particular attribute of the information. For language tokens, each element of a token's vector represents some semantic attribute of the token. The specific categories for the elements of the vectors in a language model are determined during training based on how commonly words are used together or in similar contexts.
 
-> [!VIDEO https://play.vidyard.com/sq5CuXbmZzdpqWABdwVjug?loop=1]
-
 Vectors represent lines in multidimensional space, describing *direction* and *distance* along multiple axes (you can impress your mathematician friends by calling these *amplitude* and *magnitude*). It can be useful to think of the elements in an embedding vector for a token as representing steps along a path in multidimensional space. For example, a vector with three elements represents a path in 3-dimensional space in which the element values indicate the units traveled forward/back, left/right, and up/down. Overall, the vector describes the direction and distance of the path from origin to end.
 
 The elements of the tokens in the embeddings space each represent some semantic attribute of the token, so that semantically similar tokens should result in vectors that have a similar orientation – in other words they point in the same direction. A technique called *cosine similarity* is used to determine if two vectors have similar directions (regardless of distance), and therefore represent semantically linked words. As a simple example, suppose the embeddings for our tokens consist of vectors with three elements, for example: