LocalChat

LocalChat is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% local, no data leaves your execution environment at any point.

The project provides an API offering all the primitives required to build local, context-aware AI applications. It follows and extends the OpenAI API standard, and supports both normal and streaming responses.

The API is divided into two logical blocks:

High-level API, which abstracts all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:

Ingestion of documents: internally managing document parsing, splitting, metadata extraction, embedding generation and storage.
Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt engineering and the response generation.

Low-level API, which allows advanced users to implement their own complex pipelines:

Embeddings generation: based on a piece of text.
Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested documents.

In addition to this, a working Gradio UI client is provided to test the API, together with a set of useful tools such as bulk model download script, ingestion script, documents folder watch, etc.

🎞️ Overview

Warning

This README is not updated as frequently as the documentation. Please check it out for the latest updates!

Motivation behind LocalChat

Generative AI is a game changer for our society, but adoption in companies of all sizes and data-sensitive domains like healthcare or legal is limited by a clear concern: privacy. Not being able to ensure that your data is fully under your control when using third-party AI tools is a risk those industries cannot take.

📄 Documentation

Full documentation on installation, dependencies, configuration, running the server, deployment options, ingesting local documents, API details and UI features can be found here: https://docs.localchat.dev/

🧩 Architecture

Conceptually, LocalChat is an API that wraps a RAG pipeline and exposes its primitives.

The API is built using FastAPI and follows OpenAI's API scheme.
The RAG pipeline is based on LlamaIndex.

The design of LocalChat allows to easily extend and adapt both the API and the RAG implementation. Some key architectural decisions are:

Dependency Injection, decoupling the different components and layers.
Usage of LlamaIndex abstractions such as LLM, BaseEmbedding or VectorStore, making it immediate to change the actual implementations of those abstractions.
Simplicity, adding as few layers and new abstractions as possible.
Ready to use, providing a full implementation of the API and RAG pipeline.

Main building blocks:

APIs are defined in local_chat:server:<api>. Each package contains an <api>_router.py (FastAPI layer) and an <api>_service.py (the service implementation). Each Service uses LlamaIndex base abstractions instead of specific implementations, decoupling the actual implementation from its usage.
Components are placed in local_chat:components:<component>. Each Component is in charge of providing actual implementations to the base abstractions used in the Services - for example LLMComponent is in charge of providing an actual implementation of an LLM (for example LlamaCPP or OpenAI).

📖 Citation

If you use LocalChat in a paper, check out the Citation file for the correct citation.
You can also use the "Cite this repository" button in this repo to get the citation in different formats.

Here are a couple of examples:

BibTeX

@software{SkillPedia_LocalChat_2025,
author = {Local Chat by SkillPedia},
license = {Apache-2.0},
month = may,
title = {{LocalChat}},
url = {https://github.com/Sangwan70/local-chat},
year = {2023}
}

APA

SkillPedia by LocalChat (2023). LocalChat [Computer software]. https://github.com/Sangwan70/local-chat

🤗 Partners & Supporters

LocalChat is actively supported by the teams behind:

Qdrant, providing the default vector database
Fern, providing Documentation and SDKs
LlamaIndex, providing the base RAG framework and abstractions

This project has been strongly influenced and supported by other amazing projects like LangChain, GPT4All, LlamaCpp, Chroma and SentenceTransformers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fern		fern
local_chat		local_chat
local_data		local_data
models		models
scripts		scripts
tests		tests
tiktoken_cache		tiktoken_cache
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
settings-local.yaml		settings-local.yaml
settings-ollama-pg.yaml		settings-ollama-pg.yaml
settings-ollama.yaml		settings-ollama.yaml
settings.yaml		settings.yaml
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalChat

🎞️ Overview

Motivation behind LocalChat

📄 Documentation

🧩 Architecture

📖 Citation

BibTeX

APA

🤗 Partners & Supporters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LocalChat

🎞️ Overview

Motivation behind LocalChat

📄 Documentation

🧩 Architecture

📖 Citation

BibTeX

APA

🤗 Partners & Supporters

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages