A minimal RAG backend using Python, FastAPI, and vector similarity search. The application supports user authentication, document ingestion, querying, and response feedback.
- User registration and login (JWT-based auth)
- Ask questions and get answers based on ingested documents
- Provide feedback on helpfulness of responses
- Ingest documents for RAG
- Choose between Ollama or Hugging Face embedding models
- Built with FastAPI, PostgreSQL, pgvector, and LangChain
-
Clone the repository
git clone <your-repo-url> cd <repo-folder>
-
Set up a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set environment variables Create a
.envfile in the root directory:DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/yourdb JWT_SECRET=your_jwt_secret MODEL_FLOW=ollama # or hf for Hugging Face
-
Run migrations (if using Alembic)
alembic upgrade head
-
Start the application
uvicorn app.main:app --reload
You can switch between embedding models by setting the MODEL_FLOW in the .env file:
Ref: .env.sample
Uses langchain_ollama:
from langchain_ollama import OllamaEmbeddings
def get_ollama_embeddings(model_name: str = "mxbai-embed-large"):
return OllamaEmbeddings(model=model_name)Supported models:
mxbai-embed-largenomic-embed-text
Uses langchain_huggingface:
from langchain_huggingface import HuggingFaceEmbeddings
def get_hf_embedding_model():
return HuggingFaceEmbeddings(model_name="sentence-transformers/all-roberta-large-v1")Supported models:
sentence-transformers/all-roberta-large-v1(1024 dims)sentence-transformers/all-MiniLM-L6-v2(384 dims)
curl --location 'http://127.0.0.1:8000/api/v1/auth/register' \
--header 'Content-Type: application/json' \
--data-raw '{
"email": "test2@example.com",
"password": "exX$ampd1sfgsdfle",
"name": "asdf"
}'curl --location 'http://127.0.0.1:8000/api/v1/auth/login' \
--header 'Content-Type: application/json' \
--data-raw '{
"email": "test2@example.com",
"password": "exX$ampd1sfgsdfle"
}'curl --location --request GET 'http://localhost:8000/api/v1/ask' \
--header 'accept: application/json' \
--header 'Authorization: Bearer <JWT_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "what do you mean by Gen AI?"
}'curl --location --request GET 'http://localhost:8000/api/v1/mark_response' \
--header 'accept: application/json' \
--header 'Authorization: Bearer <JWT_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
"is_helpful": true,
"id": "28cc8a02-7e74-4af9-882f-3a363bd9580e"
}'curl --location 'http://localhost:8000/api/v1/ingest' \
--header 'accept: application/json' \
--header 'Authorization: Bearer <JWT_TOKEN>'- Replace
<JWT_TOKEN>with the token received from the login API. - You can switch embedding models using
MODEL_FLOW=ollamaorhf. - Ensure PostgreSQL and
pgvectorextension are installed and configured.
- Backend: FastAPI
- Auth: JWT
- Database: PostgreSQL + pgvector
- Embeddings: Ollama / Hugging Face via LangChain
- Vector Store: FAISS / pgvector
This project is licensed under the MIT License.