Replies: 7 comments 19 replies
-
I found a similar unsolved discussion that might be relevant to your issue: create_pandas_dataframe_agent [1]. To handle case-insensitive filtering in a chatbot created with the import pandas as pd
# Sample DataFrame
data = {'Employee Name': ['Alice', 'Bob', 'Charlie', 'alice'], 'Salary': [50000, 60000, 70000, 55000]}
df = pd.DataFrame(data)
# Case-insensitive filtering
search_name = 'bob'
filtered_df = df[df['Employee Name'].str.lower() == search_name.lower()]
print(filtered_df) Regarding the use of Retrieval-Augmented Generation (RAG) for accurate results and handling computations based on aggregations, RAG is primarily designed for unstructured data and may not be the best fit for structured data like DataFrames where precise computations and aggregations are required. For accurate results and handling computations based on aggregations, sticking with structured data processing techniques within pandas or similar libraries is recommended over converting the DataFrame to an unstructured format using RAG [2]. If you need to handle large datasets (e.g., 70 to 90K rows) efficiently, LangChain's RAG framework involves several components such as document loaders, text splitters, embedding models, and vector stores. Properly configuring these components can help manage large datasets effectively [3][2]. |
Beta Was this translation helpful? Give feedback.
-
@dosu I am getting this error |
Beta Was this translation helpful? Give feedback.
-
@dosu I like to fix this error
|
Beta Was this translation helpful? Give feedback.
-
@dosu, how I replace this code with my pandas data frame agent invoke?
|
Beta Was this translation helpful? Give feedback.
-
@dosu this is not working example.... it is very sad to say.... do you have complete code snippet of any working example? picking up code one pace and dupming as a user's answer is not working unfortunately :-( |
Beta Was this translation helpful? Give feedback.
-
@dosu I am looking for example for this context...
|
Beta Was this translation helpful? Give feedback.
-
@dosu... I am reading this article https://python.langchain.com/v0.2/docs/how_to/query_high_cardinality/#query-analysis |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I created a chatbot using the create_pandas_dataframe_agent function. Under the hood, I provided a pandas DataFrame, and the pandas agent performs well as long as users ask general questions without requiring filters. For example, a filter question might be: "What is the salary of Bob?" Here, "Bob" is a value in the 'Employee Name' column.
The issue arises because my end users may not know the exact name format. Sometimes they type "bob," "Bob," or "BOB," leading to the agent responding with "Not found in the DataFrame."
My question is: Can we convert the pandas DataFrame to an unstructured format and use Retrieval-Augmented Generation (RAG) on top of it? Would this provide accurate results? If I convert it to RAG, will it handle computations well if users ask questions based on aggregations?
This is my sample dataset. Is converting it to an unstructured format a viable solution? I tried converting it to JSON format but didn't have much success. I am open to suggestions. Some of my pandas DataFrames have 70 to 90K rows and 10 columns. I would appreciate any suggestions or sample code for a proof of concept (POC).
System Info
Name: langchain
Version: 0.2.15
Beta Was this translation helpful? Give feedback.
All reactions