Skip to content

Commit 5123741

Browse files
committed
LangChain updates
1 parent a36ab60 commit 5123741

File tree

4 files changed

+443
-234
lines changed

4 files changed

+443
-234
lines changed

10_LangChain/README.md

Lines changed: 124 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
[LangChain](https://www.langchain.com/) is an open-source framework designed to simplify the creation of applications that use large language models (LLMs). LangChain has a vibrant community of developers and contributors and is used by many companies and organizations. LangChain utilizes proven Prompt Engineering patterns and techniques to optimize LLMs, ensuring successful and accurate results through verified and tested best practices.
44

5-
Part of the appeal of LangChain syntax is the capability of breaking down large complex interactions with LLMs into smaller, more manageable steps by composing a reusable [chain](https://python.langchain.com/docs/modules/chains/) process. LangChain provides a syntax for chains([LCEL](https://python.langchain.com/docs/modules/chains/#lcel)), the ability to integrate with external systems through [tools](https://python.langchain.com/docs/integrations/tools/), and end-to-end [agents](https://python.langchain.com/docs/modules/agents/) for common applications.
5+
Part of the appeal of LangChain syntax is the capability of breaking down large complex interactions with LLMs into smaller, more manageable steps by composing a reusable chain process. LangChain provides a syntax for chains([LCEL](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel)), the ability to integrate with external systems through [tools](https://python.langchain.com/docs/concepts/#tools), and end-to-end [agents](https://python.langchain.com/docs/concepts/#agents) for common applications.
66

77
The concept of an agent is quite similar to that of a chain in LangChain but with one fundamental difference. A chain in LangChain is a hard-coded sequence of steps executed in a specific order. Conversely, an agent leverages the LLM to assess the incoming request with the current context to decide what steps or actions need to be executed and in what order.
88

9-
LangChain agents can leverage tools and toolkits. A tool can be an integration into an external system, custom code, or even another chain. A toolkit is a collection of tools that can be used to solve a specific problem.
9+
LangChain agents can leverage tools and toolkits. A tool can be an integration into an external system, custom code, a retriever, or even another chain. A toolkit is a collection of tools that can be used to solve a specific problem.
1010

1111
## LangChain RAG pattern
1212

@@ -16,47 +16,117 @@ Earlier in this guide, the RAG (Retrieval Augmented Generation) pattern was intr
1616

1717
When an incoming message is received, the retriever will vectorize the message and perform a vector search to find the most relevant documents for the given query. The retriever returns a list of documents that are then used to augment the prompt. The augmented prompt is then passed to the LLM (generator) to reason over the prompt and context. The output from the LLM is then parsed and returned as the final message.
1818

19-
> **Note**: A vector store retriever is only one type of retriever that can be used in the RAG pattern. Learn more about retrievers in the [LangChain documentation](https://python.langchain.com/docs/modules/data_connection/retrievers/).
19+
> **Note**: A vector store retriever is only one type of retriever that can be used in the RAG pattern. Learn more about retrievers in the [LangChain documentation](https://python.langchain.com/docs/concepts/#retrievers).
2020
2121
## Lab - Vector search and RAG using LangChain
2222

2323
In this lab uses LangChain to re-implement the RAG pattern introduced in the previous lab. Take note of the readability of the code and how easy it is to compose a reusable RAG chain using LangChain that queries the products vector index in Azure Cosmos DB for NoSQL. The lab concludes with the creation of an agent with various tools for the LLM to leverage to fulfill the incoming request.
2424

25-
This lab also requires the data provided in the previous lab titled [Load data into Azure Cosmos DB API for NoSQL containers](../08_Load_Data/README.md#lab---load-data-into-azure-cosmos-db-api-for-mongodb-collections) as well as the populated vector index created in the lab titled [Vector Search using Azure Cosmos DB for NoSQL](../09_Vector_Search_Cosmos_DB/README.md#lab---use-vector-search-on-embeddings-in-vcore-based-azure-cosmos-db-for-mongodb). Run all cells in both notebooks to prepare the data for use in this lab.
25+
This lab also requires the data provided in the previous lab titled [Load data into Azure Cosmos DB for NoSQL containers](../08_Load_Data/README.md#lab---load-data-into-azure-cosmos-db-api-for-nosql-containers) as well as the populated vector index created in the lab titled [Vector Search using Azure Cosmos DB for NoSQL](../09_Vector_Search_Cosmos_DB/README.md#lab---use-vector-search-on-embeddings-in-azure-cosmos-db-for-nosql). Run all cells in both notebooks to prepare the data for use in this lab.
2626

2727
>**Note**: It is highly recommended to use a [virtual environment](https://python.land/virtual-environments/virtualenv) for all labs.
2828
2929
Please visit the lab repository to complete [this lab](../Labs/lab_4_langchain.ipynb).
3030

3131
Some highlights of the lab include:
3232

33-
### Instantiating a vector store reference
33+
### Creating a custom LangChain retriever for Azure Cosmos DB for NoSQL
3434

3535
```python
36-
vector_store = AzureCosmosDBVectorSearch.from_connection_string(
37-
connection_string = CONNECTION_STRING,
38-
namespace = "cosmic_works.products",
39-
embedding = embedding_model,
40-
index_name = "VectorSearchIndex",
41-
embedding_key = "contentVector",
42-
text_key = "_id"
43-
)
36+
class AzureCosmosDBNoSQLRetriever(BaseRetriever):
37+
"""
38+
A custom LangChain retriever that uses Azure Cosmos DB NoSQL database for vector search.
39+
"""
40+
embedding_model: AzureOpenAIEmbeddings
41+
container: ContainerProxy
42+
model: Type[T]
43+
vector_field_name: str
44+
num_results: int=5
45+
46+
def __get_embeddings(self, text: str) -> List[float]:
47+
"""
48+
Returns embeddings vector for a given text.
49+
"""
50+
embedding = embedding_model.embed_query(text)
51+
time.sleep(0.5) # rest period to avoid rate limiting on AOAI
52+
return embedding
53+
54+
def __get_item_by_id(self, id) -> T:
55+
"""
56+
Retrieves a single item from the Azure Cosmos DB NoSQL database by its ID.
57+
"""
58+
query = "SELECT * FROM itm WHERE itm.id = @id"
59+
parameters = [
60+
{"name": "@id", "value": id}
61+
]
62+
item = list(self.container.query_items(
63+
query=query,
64+
parameters=parameters,
65+
enable_cross_partition_query=True
66+
))[0]
67+
return self.model(**item)
68+
69+
def __delete_attribute_by_alias(self, instance: BaseModel, alias):
70+
for model_field in instance.model_fields:
71+
field = instance.model_fields[model_field]
72+
if field.alias == alias:
73+
delattr(instance, model_field)
74+
return
75+
76+
def _get_relevant_documents(
77+
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
78+
) -> List[Document]:
79+
"""
80+
Performs a synchronous vector search on the Azure Cosmos DB NoSQL database.
81+
"""
82+
embedding = self.__get_embeddings(query)
83+
items = self.container.query_items(
84+
query=f"""SELECT TOP @num_results itm.id, VectorDistance(itm.{self.vector_field_name}, @embedding) AS SimilarityScore
85+
FROM itm
86+
ORDER BY VectorDistance(itm.{self.vector_field_name}, @embedding)
87+
""",
88+
parameters = [
89+
{ "name": "@num_results", "value": self.num_results },
90+
{ "name": "@embedding", "value": embedding }
91+
],
92+
enable_cross_partition_query=True
93+
)
94+
returned_docs = []
95+
for item in items:
96+
itm = self.__get_item_by_id(item["id"])
97+
# Remove the vector field from the returned item so it doesn't fill the context window
98+
self.__delete_attribute_by_alias(itm, self.vector_field_name)
99+
returned_docs.append(Document(page_content=json.dumps(itm, indent=4, default=str), metadata={"similarity_score": item["SimilarityScore"]}))
100+
return returned_docs
101+
102+
async def _aget_relevant_documents(
103+
self, query: str, *, run_manager: AsyncCallbackManagerForRetrieverRun
104+
) -> List[Document]:
105+
"""
106+
Performs an asynchronous vector search on the Azure Cosmos DB NoSQL database.
107+
"""
108+
raise Exception(f"Asynchronous search not implemented.")
44109
```
45110

46111
### Composing a reusable RAG chain
47112

48113
```python
49-
# Create a retriever from the vector store
50-
retriever = vector_store.as_retriever()
114+
# Create an instance of the AzureCosmosDBNoSQLRetriever
115+
products_retriever = AzureCosmosDBNoSQLRetriever(
116+
embedding_model = embedding_model,
117+
container = product_v_container,
118+
model = Product,
119+
vector_field_name = "contentVector",
120+
num_results = 5
121+
)
51122

52123
# Create the prompt template from the system_prompt text
53124
llm_prompt = PromptTemplate.from_template(system_prompt)
54125

55126
rag_chain = (
56-
# populate the tokens/placeholders in the llm_prompt
57-
# products takes the results of the vector store and formats the documents
127+
# populate the tokens/placeholders in the llm_prompt
58128
# question is a passthrough that takes the incoming question
59-
{ "products": retriever | format_docs, "question": RunnablePassthrough()}
129+
{ "products": products_retriever, "question": RunnablePassthrough()}
60130
| llm_prompt
61131
# pass the populated prompt to the language model
62132
| llm
@@ -70,44 +140,42 @@ rag_chain = (
70140
Tools are selected by the Large Language model at runtime. In this case, depending on the incoming user request the LLM will decide which container in the database to query. The following code shows how to create a tool for the LLM to use to query the products collection in the database.
71141

72142
```python
73-
# create a chain on the retriever to format the documents as JSON
74-
products_retriever_chain = products_retriever | format_docs
75-
76-
tools = [
77-
Tool(
78-
name = "vector_search_products",
79-
func = products_retriever_chain.invoke,
80-
description = "Searches Cosmic Works product information for similar products based on the question. Returns the product information in JSON format."
81-
)
82-
]
143+
# Create a tool that will use the product vector search in Azure Cosmos DB for NoSQL
144+
products_retriever_tool = create_retriever_tool(
145+
retriever = products_retriever,
146+
name = "vector_search_products",
147+
description = "Searches Cosmic Works product information for similar products based on the question. Returns the product information in JSON format."
148+
)
149+
tools = [products_retriever_tool]
83150
```
84151

85152
### Creating tools that call Python functions
86153

87154
Users may query for information that does not have a semantic meaning, such as an ID GUID value or a SKU number. Providing agents with tools to call Python functions to retrieve documents based on these fields is a common practice. The following is an example of adding tools that call out to Python functions for the products collection.
88155

89156
```python
90-
db = pymongo.MongoClient(CONNECTION_STRING).cosmic_works
91-
92157
def get_product_by_id(product_id: str) -> str:
93158
"""
94159
Retrieves a product by its ID.
95160
"""
96-
doc = db.products.find_one({"_id": product_id})
97-
if "contentVector" in doc:
98-
del doc["contentVector"]
99-
return json.dumps(doc)
161+
item = get_single_item_by_field_name(product_v_container, "id", product_id, Product)
162+
delete_attribute_by_alias(item, "contentVector")
163+
return json.dumps(item, indent=4, default=str)
100164

101165
def get_product_by_sku(sku: str) -> str:
102166
"""
103167
Retrieves a product by its sku.
104168
"""
105-
doc = db.products.find_one({"sku": sku})
106-
if "contentVector" in doc:
107-
del doc["contentVector"]
108-
return json.dumps(doc, default=str)
109-
110-
from langchain.tools import StructuredTool
169+
item = get_single_item_by_field_name(product_v_container, "sku", sku, Product)
170+
delete_attribute_by_alias(item, "contentVector")
171+
return json.dumps(item, indent=4, default=str)
172+
173+
def get_sales_by_id(sales_id: str) -> str:
174+
"""
175+
Retrieves a sales order by its ID.
176+
"""
177+
item = get_single_item_by_field_name(sales_order_container, "id", sales_id, SalesOrder)
178+
return json.dumps(item, indent=4, default=str)
111179

112180
tools.extend([
113181
StructuredTool.from_function(get_product_by_id),
@@ -119,21 +187,26 @@ tools.extend([
119187
### Creating an agent armed with tools for vector search and Python functions calling
120188

121189
```python
122-
system_message = SystemMessage(
123-
content = """
190+
agent_instructions = """
124191
You are a helpful, fun and friendly sales assistant for Cosmic Works, a bicycle and bicycle accessories store.
125-
126192
Your name is Cosmo.
127-
128193
You are designed to answer questions about the products that Cosmic Works sells, the customers that buy them, and the sales orders that are placed by customers.
129-
130-
If you don't know the answer to a question, respond with "I don't know."
131-
194+
If you don't know the answer to a question, respond with "I don't know."
132195
Only answer questions related to Cosmic Works products, customers, and sales orders.
133-
134196
If a question is not related to Cosmic Works products, customers, or sales orders,
135197
respond with "I only answer questions about Cosmic Works"
136-
"""
137-
)
138-
agent_executor = create_conversational_retrieval_agent(llm, tools, system_message = system_message, verbose=True)
198+
"""
199+
200+
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
201+
202+
prompt = ChatPromptTemplate.from_messages(
203+
[
204+
("system", agent_instructions),
205+
MessagesPlaceholder("chat_history", optional=True),
206+
("human", "{input}"),
207+
MessagesPlaceholder("agent_scratchpad"),
208+
]
209+
)
210+
agent = create_openai_functions_agent(llm, tools, prompt)
211+
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
139212
```

0 commit comments

Comments
 (0)