Skip to content

Commit 7c03677

Browse files
Update python-langchain-pdf-chat Tutorial as per 8.0 Docs Terminology (#78)
* Added changes for naming conventions and also updated outdated links * changes for failing CI * addressed comments
1 parent dc16134 commit 7c03677

File tree

1 file changed

+26
-23
lines changed

1 file changed

+26
-23
lines changed

tutorial/markdown/python/python-langchain-pdf-chat/python-langchain-pdf-chat.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,34 @@
11
---
22
# frontmatter
3-
path: "/tutorial-python-langchain-pdf-chat"
3+
path: "/tutorial-python-langchain-pdf-chat-with-search-vector-index"
44
# title and description do not need to be added to markdown, start with H2 (##)
5-
title: Build PDF Chat App With Couchbase Python SDK and LangChain
5+
title: Build PDF Chat App with LangChain and Couchbase Search Vector Index
66
short_title: Build PDF Chat App
77
description:
8-
- Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search, and Streamlit.
9-
- Learn to upload PDFs into Couchbase Vector Store with LangChain.
10-
- Discover how to use RAG’s for context-based Q&A’s from PDFs with LLMs.
8+
- Construct a PDF Chat App with LangChain, Couchbase Python SDK, Search Vector Index, and Streamlit.
9+
- Learn to upload PDFs into Couchbase Search Vector Store with LangChain.
10+
- Discover how to use RAG for context-based Q&A from PDFs with LLMs.
1111
content_type: tutorial
1212
filter: sdk
1313
technology:
14-
- fts
14+
- vector search
1515
- kv
1616
tags:
1717
- Streamlit
1818
- LangChain
1919
- OpenAI
2020
- Artificial Intelligence
21+
- Search Vector Index
2122
sdk_language:
2223
- python
2324
length: 45 Mins
2425
---
2526

2627
## Introduction
2728

28-
Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you’ll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.
29+
Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.
30+
31+
**This tutorial uses Search Vector Index** with Couchbase's Search service (formerly known as Full Text Search). If you are looking for Vector Search using Query Service with Hyperscale/Composite Vector Indexes, refer to [this tutorial](https://developer.couchbase.com/tutorial-python-langchain-pdf-chat-with-hyperscale-or-composite-vector-index/) instead.
2932

3033
This tutorial will demonstrate how to -
3134

@@ -80,9 +83,9 @@ Specifically, you need to do the following:
8083
- For the purpose of this tutorial, we will be using specific bucket, scope and collection. However, you may use any name of your choice but make sure to update names in all the steps.
8184
- Create a bucket named `pdf-chat`. We will use the `_default` scope and `_default` collection of this bucket.
8285

83-
### Create the Search Index on Full Text Service
86+
### Create the Search Vector Index
8487

85-
We need to create the Search Index on the Full Text Service in Couchbase. For this demo, you can import the following index using the instructions.
88+
We need to create the Search Vector Index in Couchbase. For this demo, you can import the following index using the instructions.
8689

8790
- [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html)
8891

@@ -207,7 +210,7 @@ LOGIN_PASSWORD = "<password to access the streamlit app>"
207210
208211
### Running the Application
209212

210-
After starting Couchbase server, adding vector index and installing dependencies. Our Application is ready to run.
213+
After starting Couchbase server, adding search vector index and installing dependencies. Our Application is ready to run.
211214

212215
In the projects root directory, run the following command
213216

@@ -271,14 +274,14 @@ LangChain is a powerful library that simplifies the process of building applicat
271274

272275
In the PDF Chat app, LangChain is used for several tasks:
273276

274-
- **Loading and processing PDF documents**: LangChain's [_PDFLoader_](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) is used to load the PDF files and convert them into text documents.
275-
- **Text splitting**: LangChain's [_RecursiveCharacterTextSplitter_](https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter/) is used to split the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
276-
- **Embedding generation**: LangChain integrates with [various embedding models](https://python.langchain.com/docs/modules/data_connection/text_embedding/), such as OpenAI's embeddings, to convert the text chunks into embeddings.
277-
- **Vector store integration**: LangChain provides a [_CouchbaseSearchVectorStore_](https://python.langchain.com/docs/integrations/vectorstores/couchbase/) class that seamlessly integrates with Couchbase's Vector Search, allowing the app to store and search through the embeddings and their corresponding text.
278-
- **Chains**: LangChain provides various [chains](https://python.langchain.com/docs/modules/chains/) for different requirements. For using RAG concept, we require _Retrieval Chain_ for Retrieval and _Question Answering Chain_ for Generation part. We also add _Prompts_ that guide the language model's behavior and output. These all are combined to form a single chain which gives output from user questions.
279-
- **Streaming Output**: LangChain supports [streaming](https://python.langchain.com/docs/expression_language/streaming/), allowing the app to stream the generated answer to the client in real-time.
277+
- **Loading and processing PDF documents**: LangChain's [_PDFLoader_](https://docs.langchain.com/oss/python/integrations/document_loaders) is used to load the PDF files and convert them into text documents.
278+
- **Text splitting**: LangChain's [_RecursiveCharacterTextSplitter_](https://docs.langchain.com/oss/python/integrations/splitters) is used to split the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
279+
- **Embedding generation**: LangChain integrates with [various embedding models](https://docs.langchain.com/oss/python/integrations/text_embedding), such as OpenAI's embeddings, to convert the text chunks into embeddings.
280+
- **Vector store integration**: LangChain provides a [_CouchbaseSearchVectorStore_](https://couchbase-ecosystem.github.io/langchain-couchbase/langchain_couchbase.html#couchbase-search-vector-store) class that seamlessly integrates with Couchbase's Search Vector Index, allowing the app to store and search through the embeddings and their corresponding text.
281+
- **Chains**: LangChain provides various [chains](https://api.python.langchain.com/en/latest/langchain/chains.html) for different requirements. For using RAG concept, we require _Retrieval Chain_ for Retrieval and _Question Answering Chain_ for Generation part. We also add _Prompts_ that guide the language model's behavior and output. These all are combined to form a single chain which gives output from user questions.
282+
- **Streaming Output**: LangChain supports [streaming](https://docs.langchain.com/oss/python/langchain/streaming), allowing the app to stream the generated answer to the client in real-time.
280283

281-
By combining Vector Search with Couchbase, RAG, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.
284+
By combining Search Vector Index with Couchbase, RAG, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.
282285

283286
## Let us Understand the Flow
284287

@@ -390,7 +393,7 @@ with st.form("upload pdf"):
390393

391394
This function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage or processing in the vector store. It first checks if file was actually uploaded. Then the uploaded file is saved to a temporary file in `binary` format.
392395

393-
From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format
396+
From the temporary file, PDF is loaded in [PyPDFLoader](https://reference.langchain.com/python/langchain_core/document_loaders/) from the LangChain library which loads the PDF into [LangChain Document](https://reference.langchain.com/python/langchain_core/document_loaders/) Format
394397

395398
```python
396399
def save_to_vector_store(uploaded_file, vector_store):
@@ -407,7 +410,7 @@ def save_to_vector_store(uploaded_file, vector_store):
407410

408411
### Split Documents
409412

410-
This LangChain document array will contain huge individual files which defeats the purpose while retrieval as we want to send more relevant context to LLM. So we will split it into smaller chunks or passages using LangChain's [_RecursiveCharacterTextSplitter_](https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter/):
413+
This LangChain document array will contain huge individual files which defeats the purpose while retrieval as we want to send more relevant context to LLM. So we will split it into smaller chunks or passages using LangChain's [_RecursiveCharacterTextSplitter_](https://docs.langchain.com/oss/python/integrations/splitters):
411414

412415
- chunk_size: 1500: This parameter specifies that each chunk should contain approximately 1500 characters.
413416
- chunk_overlap: 150: This parameter ensures that there is an overlap of 150 characters between consecutive chunks. This overlap helps maintain context and prevent important information from being split across chunk boundaries.
@@ -436,7 +439,7 @@ After uploading the PDF into Couchbase, we are now ready to utilize the power of
436439

437440
### LangChain Expression Language (LCEL)
438441

439-
We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://python.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.
442+
We will now utilize the power of LangChain Chains using the LangChain Expression Language (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.
440443

441444
LCEL is a domain-specific language that provides several key advantages when working with LangChain:
442445

@@ -450,15 +453,15 @@ We will be using LCEL chains in next few sections and will see how LCEL optimize
450453

451454
### Create Retriever Chain
452455

453-
We also create the [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore) of the couchbase vector store. This retriever will be used to retrieve the previously added documents which are similar to current query.
456+
We also create a [retriever](https://docs.langchain.com/oss/python/integrations/retrievers) for the Couchbase vector store. This retriever is used to retrieve the previously added documents that are similar to the current query.
454457

455458
```python
456459
retriever = vector_store.as_retriever()
457460
```
458461

459462
### Prompt Chain
460463

461-
A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation. We will use a template and create a [prompt chain](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/) using [_ChatPromptTemplate_](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/#chatprompttemplate) Class of LangChain
464+
A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation. We will use a template and create a prompt chain using [_ChatPromptTemplate_](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/#chatprompttemplate) Class of LangChain
462465

463466
```python
464467
template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
@@ -525,7 +528,7 @@ This section creates an interactive chat interface where users can ask questions
525528
- Add the user's question to the chat history.
526529
- Create a placeholder for streaming the assistant's response.
527530
- Use the chain.stream(question) method to generate the response from the RAG chain.
528-
- [Stream](https://python.langchain.com/docs/use_cases/question_answering/streaming/) the response in real-time by updating the placeholder with each response chunk.
531+
- [Stream](https://docs.langchain.com/oss/python/langchain/streaming) the response in real-time by updating the placeholder with each response chunk.
529532
- Add the final assistant's response to the chat history.
530533

531534
This setup allows users to have a conversational experience, asking questions related to the uploaded PDF, with responses generated by the RAG chain and streamed in real-time. Both the user's questions and the assistant's responses are displayed in the chat interface, along with their respective roles and avatars.

0 commit comments

Comments
 (0)