CyberSage is an advanced Retrieval-Augmented Generation (RAG) pipeline enhanced with agentic workflows for cybersecurity applications. It builds upon the CyberScienceLab RAG_LLM_CVE repository, extending its capabilities with intelligent agents, enriched threat analysis, and a modular architecture that supports multiple cybersecurity tasks like CVE summarization, log analysis, and threat hunting.
- 🔍 Vector-based Semantic Retrieval over PDF threat reports
- 🤖 Agentic Architecture for modular, extensible workflows
- 📄 CVE Intelligence Pipeline with metadata validation & generation
- 🧩 Local JSON CVE validation using NVD-like structure
- 🗂️ Chunked Document Processing with Sentence Transformers
- 🧠 Meta LLaMA-3 8B Instruct Integration via HuggingFace Transformers
- 🧪 Streamlit UI & CLI Support (optional, toggleable)
- 📚 Designed for cybersecurity research, SOC augmentation, and analyst workflows
Cybersage-RAG-Agent/
├── agents/
│ ├── log_analysis_agent.py
│ └── cve_summarizer_agent.py
├── retriever/
│ └── local_semantic_search.py
├── cve_tools/
│ ├── cve_validator.py
│ └── cve_extractor.py
├── data/
│ ├── sample_pdf_reports/
│ └── local_cve_db.json
├── rag_App.py
├── theRag.py
├── requirements.txt
└── README.md
git clone https://github.com/yourusername/Cybersage-RAG-Agent.git
cd Cybersage-RAG-Agent
# macOS/Linux
python3 -m venv venv
source venv/bin/activate
# Windows
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
huggingface-cli login
Model loads automatically with:
torch_dtype=torch.bfloat16
device_map="auto"
- Memory optimization as needed
- Semantic Search: Query embedded via sentence transformer → top-k context chunks via cosine similarity
- Agent Pipeline: Calls log or CVE summarizer agent based on input type
- RAG + LLM: Sends retrieved context + prompt to Meta LLaMA-3 model
- CVE Validation: CVEs are validated from a local JSON CVE metadata set
streamlit run rag_App.py
Try out:
- “Explain CVE-2023-23397”
- Upload PDF reports for semantic retrieval
- Log simulation via log agent
Context:
<retrieved documents>
Question:
What is the impact of CVE-2023-23397 on Outlook clients?
Answer (based on the context only):
- Threat Intelligence Summarization
- Local CVE Metadata Lookup
- SOC Log Analysis
- Generative Threat Report Generation
This project inherits the license of the original CyberScienceLab/RAG_LLM_CVE. Check LICENSE
file.
Developed & extended by Rudraksh Gupta
@mohakrudrakshh
Cybersecurity MSc | AI x Threat Intelligence
GitHub: @mohakrudrakshh
- Multi-agent collaboration using LangGraph / CrewAI
- Dynamic PDF parsing and NVD integration
- Web-based dashboard with history logging
- Real-time threat feed summarization