RAG System Template with Google Gemini & MongoDB Atlas

This project is a Retrieval-Augmented Generation (RAG) system template. It leverages Google Gemini for generation and embeddings, and MongoDB Atlas as the vector store to manage and retrieve context for answering user queries.

Slides for the GHW stream can be found here

Features

Vector Store: Uses MongoDB Atlas Vector Search to store and retrieve document embeddings.
Embeddings: Powered by Google's model/embeddings-001.
LLM: Uses Google's gemini-2.5-flash model for generating responses.
Framework: Built using LangChain and Streamlit.

Prerequisites

Before running the application, ensure you have the following:

Python 3.8+ installed.
A MongoDB Atlas cluster with a vector search index configured.
A Google Cloud Project with the Generative AI API enabled and an API key.

Installation

Clone the repository:

git clone <your-repo-url>
cd <your-repo-directory>

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```

Configuration

This project uses streamlit.secrets for managing sensitive configuration. You typically need to configure two main components: MongoDB and Google AI.

Create a secrets file: Create a folder named .streamlit in your project root and add a file named secrets.toml:

# .streamlit/secrets.toml
MONGO_URI = "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority"

Environment Variables: The LangChain Google integration typically requires the GOOGLE_API_KEY environment variable. You can set this in your terminal or add it to a .env file (if you extend the setup to load it):
```
export GOOGLE_API_KEY="your-google-api-key"
```
MongoDB Atlas Setup: Ensure your MongoDB collection (vector_store_database.embeddings_stream) has a Vector Search Index named vector_index_ghw.
- Database Name: vector_store_database
- Collection Name: embeddings_stream
- Index Name: vector_index_ghw
Example Index Definition:
```
{
  "fields": [
    {
      "numDimensions": 768,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}
```

Usage

The core logic is contained in backend.py. You can import these functions into a Streamlit frontend or another Python script.

Key Functions

ingest_text(text_content): Takes a string of text, creates a document, calculates its embedding, and stores it in MongoDB.
get_rag_response(query): Performs a similarity search for the top 3 relevant documents in MongoDB and uses the Gemini LLM to answer the query based on that context.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
pages		pages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
backend.py		backend.py
home.py		home.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG System Template with Google Gemini & MongoDB Atlas

Features

Prerequisites

Installation

Configuration

Usage

Key Functions

About

Uh oh!

Releases

Packages

Languages

License

JocelynVelarde/rag-template

Folders and files

Latest commit

History

Repository files navigation

RAG System Template with Google Gemini & MongoDB Atlas

Features

Prerequisites

Installation

Configuration

Usage

Key Functions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages