This is a RAG implementation using Open Source stack. BioMistral 7B has been used to build this app along with PubMedBert as an embedding model, Qdrant as a self hosted Vector DB, and Langchain & Llama CPP as an orchestration frameworks.
This demo is intended only for the purpose of exploring new LLM use cases at the edge and not recommended for production-grade medical chatbot
Processor: Intel® Core™ Ultra 7 165H
OS: Windows 11 Pro 23H2
RAM: 64GB
Python 3.11.9
Install Microsoft Visual C++ compiler toolset required for installing llama-cpp-python
- Clone this repository
- Create a Python virtual environment and install the dependencies
python -m venv biomistral_rag biomistral_rag\Scripts\activate python -m pip install pip --upgrade cd <folder_name> pip install -r requirements.txt pip install qdrant-client --upgrade
- Download the INT4 version of BioMistral-7B model in GGUF format
- Download the embedding model
git lfs install git clone https://huggingface.co/NeuML/pubmedbert-base-embeddings
- Install Docker Desktop for Windows with optional proxy settings
- Create the Qdrant container
docker pull qdrant/qdrant docker run -p 6333:6333 -v .\qdrant_db\:/qdrant/storage qdrant/qdrant
Access the Dashboard using http://localhost:6333/dashboard - Create embeddings for the new documents in data
Check the new Collection on the Qdrant Dashboard
python ingest.py
- Run the application
uvicorn app:app