FastRAG is a simple Retrieval-Augmented Generation (RAG) application optimized for fast performance on general-grade PCs. It provides a chatbot interface that leverages vector-based search and large language models (LLMs) for answering questions and interacting with document-based data.
To get started with FastRAG locally, follow these steps:
-
Clone the repository:
git clone https://github.com/bibekyess/FastRAG.git
-
Navigate to the project directory:
cd FastRAG -
Build and launch the containers:
docker compose up --build
This will start the FastRAG API and demo with all necessary services.
The FastRAG application launches several API endpoints for different purposes:
-
Get Conversation History
- Method:
GET - Endpoint:
/conversation-history - Parameters:
collection_name(str): Name of the collection to fetch history from.limit(int): Number of history entries to return. Default is 10.
- Method:
-
Add to Conversation History
- Method:
POST - Endpoint:
/conversation-history - Body:
collection_name(str): Name of the collection to fetch history from.query(str): User input queryresponse_text(str): AI response
- Method:
-
Parse Document
- Method:
POST - Endpoint:
/parse - Parameters:
file(UploadFile): The document to be parsed.index_id(str): Index name for the document. Default isfiles.splitting_type(Literal['raw', 'md']): Splitting type for the document. Default israw(based on chunk settings).
- Method:
-
Chat with the Bot
- Method:
POST - Endpoint:
/chat - Body:
user_input(str): The user's query.index_id(str): The index to search. Default is"files".llm_text(str): The LLM model to use. Default is"local".dense_top_k(int): The number of top results to return from the vector search. Default is 5.upgrade_user_input(bool): Flag to indicate whether to upgrade the user input from conversation history. Default isFalse.stream(bool): Flag to enable streaming of results. Default isTrue.
- Method:
- Gradio UI: FastRAG features a simple Gradio-based user interface for interacting with the chatbot.
- Real-time Chat: Users can upload a document and ask questions in real-time, with previous conversations stored and utilized for context-based improvements. [Providing the option to upload document is in progress]
- QdrantDB: The vector embeddings and chatbot conversation history are stored in QdrantDB. This allows the chatbot to utilize previous conversation context for improved responses.
- UI Display: Latency of the chatbot's response is displayed in the Gradio interface.
- Logging: Detailed logs of latency and other events are saved for debugging and performance monitoring.
FastRAG offers multiple options for segmenting documents into chunks:
- Raw Format: This option allows experimenting with various chunk sizes, strides, and overlapping settings for raw text parsing.
- Markdown Format: This method segments the document based on semantic information, creating more context-aware chunks.