Skip to content

Commit 4c7fe4d

Browse files
authored
Merge branch 'main' into update-ruby-on-rails-tutorial
2 parents 437df42 + ec6e57e commit 4c7fe4d

File tree

7 files changed

+608
-22
lines changed

7 files changed

+608
-22
lines changed
77 Bytes
Loading

tutorial/markdown/nodejs/nodejs-langchain-pdf-chat/nodejs-langchain-pdf-chat.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
# frontmatter
33
path: "/tutorial-nodejs-langchain-pdf-chat"
44
# title and description do not need to be added to markdown, start with H2 (##)
5-
title: Build PDF Chat App With Couchbase Nodejs SDK and Langchain
5+
title: Build PDF Chat App With Couchbase Nodejs SDK and LangChain
66
short_title: Build PDF Chat App
77
description:
8-
- Construct a PDF Chat App with Langchain, Couchbase Node.js SDK, Couchbase Vector Search, and Next.js.
9-
- Learn to upload PDFs into Couchbase Vector Store with Langchain.
8+
- Construct a PDF Chat App with LangChain, Couchbase Node.js SDK, Couchbase Vector Search, and Next.js.
9+
- Learn to upload PDFs into Couchbase Vector Store with LangChain.
1010
- Discover how to use RAG’s for context-based Q&A’s from PDFs with LLMs.
1111
content_type: tutorial
1212
filter: sdk
@@ -15,7 +15,7 @@ technology:
1515
- kv
1616
tags:
1717
- Next.js
18-
- Langchain
18+
- LangChain
1919
- OpenAI
2020
sdk_language:
2121
- nodejs
@@ -29,7 +29,7 @@ Welcome to this comprehensive guide on constructing an AI-enhanced Chat Applicat
2929
This tutorial will demonstrate how to -
3030

3131
- Construct a [Couchbase Search Index](https://www.couchbase.com/products/vector-search/) for doing Vector Search
32-
- Chunk PDFs into Vectors with [Langchain.js](https://js.langchain.com/) and use [Couchbase Vector Store](https://js.langchain.com/docs/integrations/vectorstores/couchbase) to store the vectors into couchbase
32+
- Chunk PDFs into Vectors with [LangChain.js](https://js.langchain.com/) and use [Couchbase Vector Store](https://js.langchain.com/docs/integrations/vectorstores/couchbase) to store the vectors into couchbase
3333
- Query large language models via the [RAG framework](https://aws.amazon.com/what-is/retrieval-augmented-generation/) for contextual insights. We will use [OpenAI](https://openai.com) for generating Embeddings and LLM
3434
- Craft an elegant UI with Next.js. All these components come together to create a seamless, AI-powered chat experience.
3535

@@ -255,7 +255,7 @@ When a user asks a question or provides a prompt:
255255
- The search index facilitates fast and accurate retrieval, enabling the app to provide context-aware and relevant responses to the user's queries, even when the phrasing or terminology differs from the PDF content.
256256
- Couchbase's Vector Search integrates seamlessly with LangChain's [CouchbaseVectorStore](https://js.langchain.com/docs/integrations/vectorstores/couchbase#create-vector-store) class, abstracting away the complexities of vector similarity calculations.
257257

258-
### Langchain.js
258+
### LangChain.js
259259

260260
LangChain is a powerful library that simplifies the process of building applications with [large language models](https://en.wikipedia.org/wiki/Large_language_model) (LLMs) and vector stores like Couchbase.
261261

@@ -265,7 +265,7 @@ In the PDF Chat app, LangChain is used for several tasks:
265265
- **Text splitting**: LangChain's [_RecursiveCharacterTextSplitter_](https://js.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter) is used to split the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
266266
- **Embedding generation**: LangChain integrates with [various embedding models](https://js.langchain.com/docs/integrations/text_embedding), such as OpenAI's embeddings, to convert the text chunks into embeddings.
267267
- **Vector store integration**: LangChain provides a [_CouchbaseVectorStore_](https://js.langchain.com/docs/integrations/vectorstores/couchbase#create-vector-store) class that seamlessly integrates with Couchbase's Vector Search, allowing the app to store and search through the embeddings and their corresponding text.
268-
- **Chains**: Langchain provides various [chains](https://js.langchain.com/docs/modules/chains/) for different requirements. For using RAG concept, we require _Retrieval Chain_ for Retrieval and _Question Answering Chain_ for Generation part. We also add _Prompts_ that guide the language model's behavior and output. These all are combined to form a single chain which gives output from user questions.
268+
- **Chains**: LangChain provides various [chains](https://js.langchain.com/docs/modules/chains/) for different requirements. For using RAG concept, we require _Retrieval Chain_ for Retrieval and _Question Answering Chain_ for Generation part. We also add _Prompts_ that guide the language model's behavior and output. These all are combined to form a single chain which gives output from user questions.
269269
- **Streaming Output**: LangChain integrates with the [_StreamingTextResponse_](https://js.langchain.com/docs/expression_language/streaming) class, allowing the app to stream the generated answer to the client in real-time.
270270

271271
By combining Vector Search with Couchbase, RAG, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.
@@ -394,12 +394,12 @@ const rawDocs = await loader.load();
394394

395395
### Split Documents
396396

397-
This Langchain document array will contain huge individual files which defeats the purpose while retrieval as we want to send more relevant context to LLM. So we will split it into smaller chunks or passages using LangChain's [_RecursiveCharacterTextSplitter_](https://js.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter):
397+
This LangChain document array will contain huge individual files which defeats the purpose while retrieval as we want to send more relevant context to LLM. So we will split it into smaller chunks or passages using LangChain's [_RecursiveCharacterTextSplitter_](https://js.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter):
398398

399399
- chunkSize: 1000: This parameter specifies that each chunk should contain approximately 1000 characters.
400400
- chunkOverlap: 200: This parameter ensures that there is an overlap of 200 characters between consecutive chunks. This overlap helps maintain context and prevent important information from being split across chunk boundaries.
401401

402-
At the end _splitDocuments_ method splits the large document into smaller Langchain documents based on above defined parameters.
402+
At the end _splitDocuments_ method splits the large document into smaller LangChain documents based on above defined parameters.
403403

404404
```typescript
405405
const textSplitter = new RecursiveCharacterTextSplitter({
@@ -449,7 +449,7 @@ const couchbaseConfig: CouchbaseVectorStoreArgs = {
449449

450450
### Create Vector Store From Documents
451451

452-
With everything ready for initializing Vector store, we create it using [_CouchbaseVectorStore.fromDocuments_](https://js.langchain.com/docs/integrations/vectorstores/couchbase#create-vector-store) function in Langchain. This function requires the documents which user wants to upload, details of couchbase vector store and an embeddings client which will create text to vector (embeddings).
452+
With everything ready for initializing Vector store, we create it using [_CouchbaseVectorStore.fromDocuments_](https://js.langchain.com/docs/integrations/vectorstores/couchbase#create-vector-store) function in LangChain. This function requires the documents which user wants to upload, details of couchbase vector store and an embeddings client which will create text to vector (embeddings).
453453

454454
```typescript
455455
await CouchbaseVectorStore.fromDocuments(docs, embeddings, couchbaseConfig);
@@ -482,7 +482,7 @@ This API at `/app/api/chat/route.ts` is used when user asks a question from the
482482

483483
The user will type a message and the message will be sent to the chat API. The current message and other history messages are segregated.
484484

485-
All the message history is formatted using _formatVercelMessages_ Function. This function takes a _VercelChatMessage_ object and converts it into a [HumanMessage](https://api.js.langchain.com/classes/langchain_core_messages.HumanMessage.html) or [AIMessage](https://api.js.langchain.com/classes/langchain_core_messages.AIMessage.html) object, which are classes from the Langchain library used to represent [conversation messages](https://js.langchain.com/docs/expression_language/how_to/with_history).
485+
All the message history is formatted using _formatVercelMessages_ Function. This function takes a _VercelChatMessage_ object and converts it into a [HumanMessage](https://api.js.langchain.com/classes/langchain_core_messages.HumanMessage.html) or [AIMessage](https://api.js.langchain.com/classes/langchain_core_messages.AIMessage.html) object, which are classes from the LangChain library used to represent [conversation messages](https://js.langchain.com/docs/expression_language/how_to/with_history).
486486

487487
At the end we get all the previous messages in chat formatted form and current question (message) from user
488488

@@ -516,9 +516,9 @@ const currentMessageContent = messages[messages.length - 1].content;
516516

517517
We require OpenAI's embedding model and [LLM model](https://js.langchain.com/docs/integrations/llms/openai).
518518

519-
OpenAI Embeddings are vector representations of text generated by OpenAI's language models. In this API, the [_OpenAIEmbeddings_](https://js.langchain.com/docs/integrations/text_embedding/openai) class from the Langchain library is used to generate embeddings for the documents stored in the Couchbase Vector Store.
519+
OpenAI Embeddings are vector representations of text generated by OpenAI's language models. In this API, the [_OpenAIEmbeddings_](https://js.langchain.com/docs/integrations/text_embedding/openai) class from the LangChain library is used to generate embeddings for the documents stored in the Couchbase Vector Store.
520520

521-
The [_ChatOpenAI_](https://js.langchain.com/docs/integrations/chat/openai) class from the Langchain library is used as the language model for generating responses. It is an interface to OpenAI's chat models, which are capable of understanding and generating human-like conversations.
521+
The [_ChatOpenAI_](https://js.langchain.com/docs/integrations/chat/openai) class from the LangChain library is used as the language model for generating responses. It is an interface to OpenAI's chat models, which are capable of understanding and generating human-like conversations.
522522

523523
```typescript
524524
const model = new ChatOpenAI({});
@@ -569,25 +569,25 @@ const retriever = couchbaseVectorStore.asRetriever({
569569
});
570570
```
571571

572-
### Langchain Expression Language (LCEL)
572+
### LangChain Expression Language (LCEL)
573573

574-
We will now utilize the power of Langchain Chains using the [Langchain Expression Language](https://js.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.
574+
We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://js.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.
575575

576-
LCEL is a domain-specific language that provides several key advantages when working with Langchain:
576+
LCEL is a domain-specific language that provides several key advantages when working with LangChain:
577577

578-
- Composability: It allows you to easily combine different Langchain components like retrievers, language models, and output parsers into complex workflows.
578+
- Composability: It allows you to easily combine different LangChain components like retrievers, language models, and output parsers into complex workflows.
579579
- Readability: The syntax is concise and expressive, making it easy to understand the flow of operations within a chain or sequence.
580580
- Reusability: You can define reusable sub-chains or components that can be incorporated into larger chains, promoting code reuse and modularity.
581581

582-
In summary, LCEL streamlines the process of building sophisticated natural language processing applications by providing a composable, readable, reusable, extensible, type-safe, and abstracted way to define and orchestrate Langchain components into complex workflows.
582+
In summary, LCEL streamlines the process of building sophisticated natural language processing applications by providing a composable, readable, reusable, extensible, type-safe, and abstracted way to define and orchestrate LangChain components into complex workflows.
583583

584584
We will be using LCEL chains in next few sections and will see how LCEL optimizes our whole workflow.
585585

586586
### History Aware Prompt
587587

588588
The _historyAwarePrompt_ is used to generate a query for the vector store based on the conversation history and the current user message.
589589

590-
The _historyAwarePrompt_ is a [_ChatPromptTemplate_](https://js.langchain.com/docs/modules/model_io/concepts#chatprompttemplate) from the Langchain library.
590+
The _historyAwarePrompt_ is a [_ChatPromptTemplate_](https://js.langchain.com/docs/modules/model_io/concepts#chatprompttemplate) from the LangChain library.
591591
It is defined to include the conversation history, the current user message, and a prompt asking for a concise vector store search query.
592592

593593
```typescript
@@ -603,7 +603,7 @@ const historyAwarePrompt = ChatPromptTemplate.fromMessages([
603603

604604
### History Aware Chain
605605

606-
The _historyAwareRetrieverChain_ is created using the [_createHistoryAwareRetriever_](https://api.js.langchain.com/functions/langchain_chains_history_aware_retriever.createHistoryAwareRetriever.html) function from the Langchain library. It takes the _historyAwarePrompt_, the language model (ChatOpenAI instance), and the vector store retriever as input.
606+
The _historyAwareRetrieverChain_ is created using the [_createHistoryAwareRetriever_](https://api.js.langchain.com/functions/langchain_chains_history_aware_retriever.createHistoryAwareRetriever.html) function from the LangChain library. It takes the _historyAwarePrompt_, the language model (ChatOpenAI instance), and the vector store retriever as input.
607607

608608
The _historyAwareRetrieverChain_ is responsible for generating a query based on the conversation history and retrieving relevant documents from the vector store.
609609

@@ -617,7 +617,7 @@ const historyAwareRetrieverChain = await createHistoryAwareRetriever({
617617

618618
### Document Chain
619619

620-
The documentChain is created using the [_createStuffDocumentsChain_](https://api.js.langchain.com/functions/langchain_chains_combine_documents.createStuffDocumentsChain.html) function from the Langchain library. It takes the language model (ChatOpenAI instance) and a prompt (answerPrompt) as input.
620+
The documentChain is created using the [_createStuffDocumentsChain_](https://api.js.langchain.com/functions/langchain_chains_combine_documents.createStuffDocumentsChain.html) function from the LangChain library. It takes the language model (ChatOpenAI instance) and a prompt (answerPrompt) as input.
621621

622622
The answerPrompt is a _ChatPromptTemplate_ that includes instructions for the language model to generate an answer based on the provided context (retrieved documents) and the user's question.
623623

@@ -648,7 +648,7 @@ const documentChain = await createStuffDocumentsChain({
648648

649649
The _conversationalRetrievalChain_ combines the _historyAwareRetrieverChain_ and the _documentChain_.
650650

651-
- The _conversationalRetrievalChain_ is created using the [_createRetrievalChain_](https://api.js.langchain.com/functions/langchain_chains_retrieval.createRetrievalChain.html) function from the Langchain library.
651+
- The _conversationalRetrievalChain_ is created using the [_createRetrievalChain_](https://api.js.langchain.com/functions/langchain_chains_retrieval.createRetrievalChain.html) function from the LangChain library.
652652
- It takes the _historyAwareRetrieverChain_ and the _documentChain_ as input.
653653
- This chain combines the retrieval and question-answering steps into a single workflow.
654654

1.55 KB
Binary file not shown.
369 KB
Loading
729 KB
Loading

0 commit comments

Comments
 (0)