NovusAI is an advanced Agentic RAG (Retrieval-Augmented Generation) system for drug repurposing and biomedical intelligence.
It orchestrates multiple specialized agents to retrieve evidence from heterogeneous biomedical sources, synthesize insights using a large language model, and generate structured answers with visual analytics.
Unlike traditional RAG, NovusAI uses an agentic orchestration layer where each retriever is an autonomous agent operating on domain-specific sources. Evidence is merged and ranked before synthesis, enabling explainable, multi-hop biomedical reasoning.
The system supports private document ingestion, persistent sessions, and automated report generation.
- Frontend (Netlify): Live Link
- Backend (Render): https://novusai-backend.onrender.com
Traditional biomedical research is siloed—patent data, clinical trials, and academic literature rarely talk to each other. NovusAI solves this by acting as an autonomous research assistant that:
- Identifies New Pathways: Cross-references existing FDA-approved drugs with new disease targets found in recent PubMed literature.
- Risk Mitigation: Scans clinical trial failures and patent legalities to assess the feasibility of repurposing.
- Rapid Synthesis: Reduces weeks of manual literature review into seconds of structured, cited intelligence.
- Multi-Agent Orchestration: 6+ specialized agents (Patent, Clinical, Literature, Web, Market, Internal) working in parallel.
- Biomedical Entity Intelligence: Integrated with EBI OLS4 API for automated drug/disease entity extraction and synonym expansion.
- Private Knowledge Vault: Secure RAG for internal proprietary documents (.pdf, .txt) using Supabase Vector Storage.
- Visual Analytics Engine: Converts complex evidence data into interactive charts (ReCharts) for market trends and patient outcomes.
- Explainable AI: Every insight includes inline citations and source-linking to the original research or patent filing.
- Contextual Memory: Persistent session states allowing for iterative, multi-turn biomedical discovery.
|
Main Intelligence Dashboard
|
JWT Based Login/Signup
|
|
Employee Approval Page
|
Main Research Agent
|
📂 View Additional Modules (Knowledge Vault & PDF Reports)
The secure admin interface for document ingestion and vectorization.

flowchart TB
U[User Query]
PS[Pre Synthesis]
ORCH[Orchestration Layer]
PAT[Patent Agent - EPO OPS]
CLIN[Clinical Agent - ClinicalTrials]
LIT[Literature Agent - PubMed]
WEB[Web Intelligence - DuckDuckGo]
MKT[Market Agent]
INT[Internal Knowledge - Supabase]
EVID[Evidence Builder]
SYN[Synthesis Agent - Groq Llama 3.1]
VIS[Visualization Agent]
PDF[PDF Agent]
DB[(Session Database)]
U --> PS --> ORCH
ORCH --> PAT
ORCH --> CLIN
ORCH --> LIT
ORCH --> WEB
ORCH --> MKT
ORCH --> INT
PAT --> EVID
CLIN --> EVID
LIT --> EVID
WEB --> EVID
MKT --> EVID
INT --> EVID
EVID --> SYN
SYN --> VIS
VIS --> PDF
SYN --> DB
The entry point focuses on linguistic precision and query expansion.
- Entity Extraction: Automatically isolates Disease and Drug entities from natural language queries.
- Synonym Expansion: Connects to the EBI OLS4 API to build comprehensive synonym sets, ensuring the search covers all scientific and trade names.
A parallelized agentic layer that queries diverse data silos simultaneously.
| Agent | Source | Data Domain |
|---|---|---|
| Patent Agent | EPO OPS | Intellectual property, chemical filings, and legal status. |
| Clinical Agent | ClinicalTrials.gov | Study phases, recruitment status, and primary endpoints. |
| Literature Agent | PubMed | Academic journals and peer-reviewed clinical research. |
| Web Intel Agent | DuckDuckGo | Real-time news, press releases, and market alerts. |
| Market Agent | Mock Data | Commercial trends, pricing, and competitive landscape. |
| Internal Agent | Supabase | Proprietary documents and historical knowledge. |
Transforming raw, heterogeneous data into structured intelligence.
- Normalization: Standardizes units, dates, and nomenclature across all 6 agents.
- Merging: Deduplicates information and ranks evidence based on source credibility.
- Engine: Groq API
- Model:
llama-3.3-70b-versatile - Output: Generates high-fidelity summaries with inline citations.
The system prepares JSON-ready objects for front-end rendering:
- Market Trends: Historical and projected growth curves.
- Patient Outcomes: Comparative bar charts (Treated vs. Untreated).
- Clinical Roadmap: Pie charts or timelines showing Study Phases (I, II, III, IV).
- Storage: Saves synthesized answers to a permanent database.
- Session State: Enables "rebuild" functionality where the model remembers previous context for iterative discovery.
- File types: .pdf, .txt
- Storage: Supabase (company_docs bucket)
- Used by Internal Knowledge Agent
- JWT-based Roles:
- admin → upload documents
- employee → query only
Building a multi-agent system for biomedical data presented several "real-world" hurdles. Below is how I engineered past them:
- Challenge: Initial testing with local models (Ollama/Phi) resulted in poor instruction following and hallucinated citations.
- Solution: Migrated to Groq (Llama-3.3-70B) and implemented rigorous Prompt Engineering with few-shot examples to ensure the model strictly adheres to biomedical formatting and citation rules.
- Challenge: Agents initially returned a high volume of low-value data, cluttering the synthesis layer.
- Solution: Implemented a Domain-Specific Ranking System.
- Patents: Ranked by expiry date and legal status.
- Literature: Ranked by citation count and impact factor.
- Result: Only the top 5 most authoritative pieces of evidence are passed to the LLM.
- Challenge: Queries for a specific drug often failed because different sources used different names (trade names vs. chemical names), leading to near-zero results.
- Solution: Integrated Synonym Expansion via the EBI OLS4 API. The system now automatically expands a single user query into a comprehensive set of scientific and commercial synonyms before the agents begin retrieval.
- Challenge: Agents occasionally provided conflicting information (e.g., a web alert claiming a drug failure vs. a patent filing showing active development).
- Solution: Developed a Source-Credibility Scoring Matrix. I assigned weighted priority to data sources:
Patent Data (High)>Clinical Trials>Literature>Web Intel (Low).- The LLM is prompted to prioritize "Source Truth" based on these weights when synthesizing the final report.
- FastAPI
- SQLAlchemy
- JWT
- Supabase Storage
- Groq API
- React + Vite
- Mantine UI
- Axios
- ReCharts
- Backend → Render
- Frontend → Netlify
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reloadcd frontend
npm install
npm run devSUPABASE_URL=xxxxxxxxxx
SUPABASE_SERVICE_KEY=xxxxxxxxxx
SUPABASE_ANON_KEY=xxxxxxxxxx
JWT_SECRET_KEY=xxxxxxxxxx
JWT_ALGORITHM=xxxxxxxxxx
ENV=xxxxxxxxxx
PUBLIC_API_URL=xxxxxxxxxx
GROQ_API_KEY=xxxxxxxxxx
GROQ_BASE_URL=xxxxxxxxxx
MODEL_NAME=xxxxxxxxxx
CONSUMER_KEY=xxxxxxxxxx
CONSUMER_SECRET=xxxxxxxxxxVITE_API_BASE_URL=https://<backend-url>NovusAI/
backend/
frontend/Devashish Mishra B.Tech | AI/ML | Full-Stack | Cloud




