Lightweigh multimodal RAG for single pdf document with local LLM llava. Chat with your pdf! This project is for educative purposes. For now it is only a command line app.
TODO
It is a poetry python project for which you need some system dependencies. Specifically
- Tesseract OCR
- opnencv
- poppler
- ollama
Look at the Dockerfile
file for inspiration on how to download them.
TODO
Because this projects uses Large Language Model locally you want to have GPU support. At the moment this projects supports only nvidia gpus. Make sure you have installed: NVIDIA Container Toolkit
- Clone the repo
git clone https://github.com/adam-osusky/multi-modal-search.git
cd multi-modal-search
- Build the image. This can take a while.
docker build -t mulmod .
- Start shell in the container.
docker run -it mulmod:latest /bin/bash
- Download LLM model from ollama.
nohup bash -c "ollama serve &" && sleep 7 && ollama pull llava
- Now you can download pdf that you want to chat with and start the main.
!curl -o llm.pdf https://arxiv.org/pdf/2307.06435.pdf
python3 src/mulmod/main.py llm.pdf 1
To run the application:
python src/mulmod/main.py <filepath> <mode>
where <filepath>
is path to the pdf file that you want to use and <mode>
is for setting wheter you want to only retrieve relevant parts of the document or also to chat. Where:
<mode> = 0
retrieval only<mode> != 0
rag - also answer from LLM