A comprehensive multi-layered security framework for AI agents in enterprise workflows, powered by Cerebras and Llama Guard.
As AI agents become deeply integrated into enterprise workflows, ensuring their safety and security becomes paramount. Our solution addresses the critical need for protecting AI systems from:
- Jailbreaking attacks that manipulate AI behavior
- Backdoor attacks planted in open-source LLMs
- Destructive actions like unauthorized database deletions
- Suspicious prompts from malicious actors
Unlike existing solutions that only protect single LLMs, our framework is designed for the multi-agent future of enterprise AI adoption, providing inclusive protection for both proprietary and open-source models.
- Regex-based code-aware protection
- Detects jailbreaks, prompt injection, secrets, unsafe APIs
- Code-aware extraction to reduce false positives
- Semantic content moderation via Llama Guard
- Intelligent analysis of natural language content
- 14 safety categories including malicious instructions and illegal activities
- Powered by Meta's Llama Guard model
- Advanced backdoor and malware detection
- AST SQL checks and subprocess heuristics
- Optional CodeBERT embeddings with FAISS similarity
- Runtime tracing capabilities
- Multi-agent validation and risk assessment
- Powered by Cerebras for intelligent summarization
- Risk labeling (Low/Medium/High) for enterprise decision-making
- Multi-agent code generation using LangGraph and Cerebras API
- Real-time safety monitoring with live telemetry streaming
- Semantic moderation via Llama Guard integration
- Enterprise-ready Flask UI with Server-Sent Events
- Comprehensive protection against various attack vectors
AI Safety Protocol
βββ L1: Sentinel CodeGuard # Regex-first protection
βββ Llama Guard Integration # Semantic moderation
βββ L2: Sentinel Backdoor # Advanced threat detection
βββ L3: Sentinel MultiAgent # Cerebras-powered validation
Below are screenshots of the AI Safety Protocol dashboard and real-time monitoring system.
The system provides a live monitoring view powered by Server-Sent Events, enabling real-time detection of safety violations and flagged responses.

- Python 3.10+
- Cerebras API key
- Optional: Groq API key for Llama Guard
git clone <repository-url>
cd AI-safety-protocol
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtexport CEREBRAS_API_KEY=your_cerebras_key
export GROQ_API_KEY=your_groq_key # Optional for Llama Guardpython app.pyOpen http://localhost:5000 to access the web interface.
- Planner: Creates implementation strategies
- Coder: Generates code based on plans
- Reviewer: Evaluates code quality and safety
- Refiner: Produces production-ready output
- Cerebras: Powers the multi-agent validation layer
- Llama Guard: Provides semantic content moderation
- CodeBERT: Enables embedding-based threat detection
POST /generate- Generate code with safety checksGET /stream- Real-time safety telemetryPOST /receive- Internal safety data collection
- Code-aware extraction reduces false positives
- Multi-layered validation ensures comprehensive protection
- Real-time monitoring provides immediate threat detection
- Enterprise-grade risk assessment and reporting
Customize safety rules in sentinel_codeguard/config.py:
- Adjust detection categories
- Configure action responses (block/warn/redact)
- Tune sensitivity levels
This framework provides robust safety guardrails but should be used as part of a comprehensive security strategy. Always review generated code and implement additional security measures appropriate for your environment.
See LICENSE file for details.
Built for the future of enterprise AI - where safety meets innovation.

