You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/concepts/chat_history.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ A full conversation often involves a combination of two patterns of alternating
26
26
27
27
Since chat models have a maximum limit on input size, it's important to manage chat history and trim it as needed to avoid exceeding the [context window](/docs/concepts/chat_models/#context-window).
28
28
29
-
While processing chat history, it's essential to preserve a correct conversation structure.
29
+
While processing chat history, it's essential to preserve a correct conversation structure.
Copy file name to clipboardExpand all lines: docs/docs/concepts/embedding_models.mdx
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,9 @@ Embedding models can also be [multimodal](/docs/concepts/multimodality) though s
15
15
16
16
Imagine being able to capture the essence of any text - a tweet, document, or book - in a single, compact representation.
17
17
This is the power of embedding models, which lie at the heart of many retrieval systems.
18
-
Embedding models transform human language into a format that machines can understand and compare with speed and accuracy.
18
+
Embedding models transform human language into a format that machines can understand and compare with speed and accuracy.
19
19
These models take text as input and produce a fixed-length array of numbers, a numerical fingerprint of the text's semantic meaning.
20
-
Embeddings allow search system to find relevant documents not just based on keyword matches, but on semantic understanding.
20
+
Embeddings allow search system to find relevant documents not just based on keyword matches, but on semantic understanding.
21
21
22
22
## Key concepts
23
23
@@ -27,16 +27,16 @@ Embeddings allow search system to find relevant documents not just based on keyw
27
27
28
28
(2) **Measure similarity**: Embedding vectors can be compared using simple mathematical operations.
29
29
30
-
## Embedding
30
+
## Embedding
31
31
32
-
### Historical context
32
+
### Historical context
33
33
34
-
The landscape of embedding models has evolved significantly over the years.
35
-
A pivotal moment came in 2018 when Google introduced [BERT (Bidirectional Encoder Representations from Transformers)](https://www.nvidia.com/en-us/glossary/bert/).
34
+
The landscape of embedding models has evolved significantly over the years.
35
+
A pivotal moment came in 2018 when Google introduced [BERT (Bidirectional Encoder Representations from Transformers)](https://www.nvidia.com/en-us/glossary/bert/).
36
36
BERT applied transformer models to embed text as a simple vector representation, which lead to unprecedented performance across various NLP tasks.
37
-
However, BERT wasn't optimized for generating sentence embeddings efficiently.
37
+
However, BERT wasn't optimized for generating sentence embeddings efficiently.
38
38
This limitation spurred the creation of [SBERT (Sentence-BERT)](https://www.sbert.net/examples/training/sts/README.html), which adapted the BERT architecture to generate semantically rich sentence embeddings, easily comparable via similarity metrics like cosine similarity, dramatically reduced the computational overhead for tasks like finding similar sentences.
39
-
Today, the embedding model ecosystem is diverse, with numerous providers offering their own implementations.
39
+
Today, the embedding model ecosystem is diverse, with numerous providers offering their own implementations.
40
40
To navigate this variety, researchers and practitioners often turn to benchmarks like the Massive Text Embedding Benchmark (MTEB) [here](https://huggingface.co/blog/mteb) for objective comparisons.
41
41
42
42
:::info[Further reading]
@@ -93,9 +93,9 @@ LangChain offers many embedding model integrations which you can find [on the em
93
93
94
94
## Measure similarity
95
95
96
-
Each embedding is essentially a set of coordinates, often in a high-dimensional space.
96
+
Each embedding is essentially a set of coordinates, often in a high-dimensional space.
97
97
In this space, the position of each point (embedding) reflects the meaning of its corresponding text.
98
-
Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space.
98
+
Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space.
99
99
This allows for intuitive comparisons between different pieces of text.
100
100
By reducing text to these numerical representations, we can use simple mathematical operations to quickly measure how alike two pieces of text are, regardless of their original length or structure.
0 commit comments