Skip to content

VedanshMannem/LocAI

Repository files navigation

LocAI - Personal AI Assistant

A locally hosted AI assistant with RAG (Retrieval-Augmented Generation) capabilities, running entirely offline on your CPU.

Requirements

This is a relatively large program. Here's the spec requirements:

  1. 16 GB (8 is ok, but program will be extremely slow)
  2. Intel i5 or Ryzen AMD 5
  3. 9 GB Storage

No GPU required! Models were selected specifically to run on CPUs.

Quick Start

Download from Source

The HuggingFace models can't be uploaded to GitHub, so releases aren't available just yet. It's required to build the prototype from source before running it. Future versions will allow downloading the model straight from the executable file.

Setup Instructions:

To set up in, run these commands:

Clone the repo:

git clone https://github.com/VedanshMannem/LocAI.git

Then, install the required packages:

pip install huggingface_hub
pip install sentence-transformers

To download the embedder (all-MiniLM-L6-v2) and AI (mistral-7b-instruct-v0.1.Q4_K_M.gguf) models, run these commands:

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir ./models/all-MiniLM-L6-v2 --local-dir-use-symlinks False
huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q4_K_M.gguf --local-dir ./models/mistral-7b-instruct-v0.1 --local-dir-use-symlinks False

Once you set those up, you need to edit your folder path.

Under gui_app.py, edit this line to be the directory that you want the model to fetch context/files from:

build_embeddings(r"") # Change to your downloads directory

Once you do that, you can run the program:

python gui_app.py

Make sure to navigate to the settings tab and update the RAG context or else the model wont work.

Instructions for use

Here's the main features of the app to explore once you get it running:

  1. RAG: Retrieve/update your files by going to settings and clicking "Retrieve New Data". This will re-run the embeddings model to create new vector embeddings based on any new files you uploaded.
  2. Use RAG context: This is to reduce run time. If you are asking the model a question that doesn't require context (doesn't need files from your computer), then toggle this. If you choose to enable RAG context, the model will take longer to run and will be more computationally expensive.

Customizability

You can customize the app based on your preferences. Here's a few suggestions to get you started:

  1. Update the model: Find a different AI on HuggingFace that your computer can support and download the ".gguf". Make sure to add it under the models folder and replace the model name in response.py. You will be build the project from source
  2. Conversation Memory: Change the amount of history you want to feed the model. Lower amounts means the model will remember a smaller amount of the conversation that you are having. Larger amounts will give the model more context, but will be more expensive to run
  3. Max Response Length: This is the amount of tokens the model can respond with. Use higher amounts to get longer responses.
  4. CPU threads: Use half the amount of threads you have available. (16 threads = 8, 8 threads = 4). This is because you don't want the app to take up all the compute you have.
  5. Color Theme: Change the theme to whatever you prefer

Credits:

We use these open source models from HuggingFace:

  1. Phi-3-mini-4k
  2. Mistral-7B-instruct
  3. All mini sentence embedder

Coming Soon ...

  1. PNG support with tesseract
  2. Easier download + AI model installation with 1 click
  3. Better optimized model to run on CPU with lower RAM requirements

About

Locally hosted CPU-run AI model with a RAG pipeline for user context

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors