An intelligent, secure, and high-performance AI Assistant for HR queries. It utilizes a Retrieval Augmented Generation (RAG) architecture to answer employee questions about company policies, leave, salary, and SOPs using your internal documents as the source of truth.
- Smart Answers: Uses RAG to answer questions based strictly on your private documents, reducing hallucinations.
- Secure Authentication: Built-in User Registration and Login system using Argon2 encryption and JWT session management.
- High Performance: Built in Rust for millisecond-latency responses and highly concurrent traffic handling.
- Privacy-First: Runs locally on your infrastructure (except for the final LLM inference call).
- Persistent History: Saves chat sessions per user, allowing for context-aware conversations.
Most RAG applications are currently built in Python. While Python is excellent for data science and prototyping, migrating the backend to Rust offers significant advantages for a production deployment:
- Performance and Latency: Rust compiles to native machine code and has no garbage collector. This eliminates the random "GC pauses" common in Python applications, ensuring that API response times are consistently low (often under 10ms for the server overhead).
- Concurrency: Python is strictly limited by the Global Interpreter Lock (GIL), which prevents true parallelism on a single CPU core for CPU-bound tasks. Rust's
Tokioruntime handles thousands of concurrent connections efficiently, making it ideal for a chat server serving many employees simultaneously. - Reliability: Rust's strict type system and ownership model catch entire classes of bugs (like NullReferenceExceptions or race conditions) at compile time. This results in a much more stable application that is less likely to crash in production.
- Deployment: The entire application compiles down to a single binary file. There is no need to manage complex Python virtual environments, dependency conflicts, or Docker containers just to run the application.
Before starting, ensure the following are installed:
- Rust: Install via
rustup(required for the backend). - PostgreSQL: A running Postgres database instance (for storing user data and chat history).
- Python 3.x (Optional): Only required if you intend to run the data ingestion script to upload new documents.
- Clone the repository:
git clone https://github.com/yourusername/hr-assistant-rag
cd hr-assistant-rag- Set your API key:
export GROQ_API_KEY=your_groq_api_key_here- Start everything:
docker-compose upThat's it! Visit http://localhost:3000 to use the application.
Create a file named .env in the project root directory and define the following variables:
DATABASE_URL=postgres://postgres:password@localhost:5432/your_database_name
GROQ_API_KEY=your_groq_api_key_here
SERPAPI_KEY=your_serpapi_key_hereRun the build script to compile the React frontend:
# On Windows
build-frontend.bat
# On Linux/Mac
cd frontend/RAGFRONTEND
npm install
npm run buildOpen a terminal in the project directory and run:
cargo run --releaseThe server will compile (the first time) and then start listening at http://localhost:3000.
For development with hot reload:
# On Windows
dev-start.bat
# On Linux/Mac - Terminal 1 (Backend)
cargo run
# Terminal 2 (Frontend)
cd frontend/RAGFRONTEND
npm run dev
# Visit http://localhost:5173 for development- Open your web browser and navigate to
http://localhost:3000(production) orhttp://localhost:5173(development). - Use the Registration form to create a new user account.
- Log in with your credentials to access the chat interface.
- Ask questions about HR policies, leave, salary, and company SOPs.
To ingest your HR documents (PDFs, Excel, Word, Text) into the system:
- Place your document files in the
Data for Rag/folder. - Install the necessary Python dependencies:
pip install psycopg2-binary langchain-community langchain-huggingface pandas openpyxl pdfplumber python-dotenv
- Run the ingestion script:
This script will parse the documents, chunk the text, generate embeddings, and store them in the database.
python "Data for Rag/data.py"
This application represents a modern, high-performance implementation of the RAG pattern. Below is a detailed breakdown of how the components interact.
Retrieval Augmented Generation works by fetching relevant information before asking the AI to answer. Here is the exact flow in this application:
- Ingestion Phase: When documents are uploaded, they are split into small "chunks" (e.g., paragraphs). We use a Transformer model (
all-MiniLM-L6-v2) to convert each chunk into a "vector embedding"—a list of numbers representing the semantic meaning of that text. These vectors are stored in the database. - Query Phase: When a user asks a question (e.g., "How much leave do I get?"), the Rust backend immediately converts this question into a vector using the same model (
fastembed-rs). - Retrieval: The system searches the vector database for text chunks that are mathematically closest to the question vector. This is done using USearch, a specialized Approximate Nearest Neighbor (ANN) search engine (simliar to FAISS but faster).
- Generation: The most relevant text chunks are combined into a prompt along with the user's question. This context is sent to the LLM (Large Language Model), which uses the information to generate a precise answer.
We utilize USearch for the vector similarity search. Unlike standard database queries that match exact text, USearch creates a Hierarchical Navigable Small World (HNSW) graph. This allows the application to find the "nearest" concepts in high-dimensional space in microseconds, even with millions of documents. By running this in-memory within the Rust application, we achieve significantly lower latency than calling an external vector database service.
The web server is built on Axum, a framework designed for the specialized Tokio ecosystem.
- Asynchronous: All I/O operations (database queries, LLM API calls) are non-blocking. This means one thread can handle other user requests while waiting for a database response.
- Type-Safe Routing: API endpoints are strictly typed, ensuring that invalid requests are rejected instantly before they reach the business logic.
- Passwords: User passwords are processed using Argon2, the winner of the Password Hashing Competition. It is designed to be memory-hard, making it extremely resistant to GPU-based brute-force attacks.
- Sessions: We use JSON Web Tokens (JWT) for stateless authentication. When a user logs in, they receive a signed token. This token is required in the HTTP headers for all subsequent requests, ensuring that the API cannot be accessed unauthorized.