Skip to content

Commit d073a68

Browse files
Merge pull request #48787 from ivorb/jan-bugfix
update embedding content
2 parents 93cc0d4 + 58cb0e1 commit d073a68

File tree

3 files changed

+0
-4
lines changed

3 files changed

+0
-4
lines changed

learn-pr/wwl-data-ai/build-copilot-ai-studio/includes/3-search-data.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@ While a text-based index will improve search efficiency, you can usually achieve
1010

1111
An embedding is a special format of data representation that a search engine can use to easily find the relevant information. More specifically, an embedding is a vector of floating-point numbers.
1212

13-
> [!VIDEO https://play.vidyard.com/sq5CuXbmZzdpqWABdwVjug?loop=1]
14-
1513
For example, imagine you have two documents with the following contents:
1614

1715
- *"The children played joyfully in the park."*
56 KB
Loading

learn-pr/wwl-data-ai/fundamentals-generative-ai/includes/3-language-models.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,6 @@ With a sufficiently large set of training text, a vocabulary of many thousands o
6666

6767
While it may be convenient to represent tokens as simple IDs - essentially creating an index for all the words in the vocabulary, they don't tell us anything about the meaning of the words, or the relationships between them. To create a vocabulary that encapsulates semantic relationships between the tokens, we define contextual vectors, known as *embeddings*, for them. Vectors are multi-valued numeric representations of information, for example [10, 3, 1] in which each numeric element represents a particular attribute of the information. For language tokens, each element of a token's vector represents some semantic attribute of the token. The specific categories for the elements of the vectors in a language model are determined during training based on how commonly words are used together or in similar contexts.
6868

69-
> [!VIDEO https://play.vidyard.com/sq5CuXbmZzdpqWABdwVjug?loop=1]
70-
7169
Vectors represent lines in multidimensional space, describing *direction* and *distance* along multiple axes (you can impress your mathematician friends by calling these *amplitude* and *magnitude*). It can be useful to think of the elements in an embedding vector for a token as representing steps along a path in multidimensional space. For example, a vector with three elements represents a path in 3-dimensional space in which the element values indicate the units traveled forward/back, left/right, and up/down. Overall, the vector describes the direction and distance of the path from origin to end.
7270

7371
The elements of the tokens in the embeddings space each represent some semantic attribute of the token, so that semantically similar tokens should result in vectors that have a similar orientation – in other words they point in the same direction. A technique called *cosine similarity* is used to determine if two vectors have similar directions (regardless of distance), and therefore represent semantically linked words. As a simple example, suppose the embeddings for our tokens consist of vectors with three elements, for example:

0 commit comments

Comments
 (0)