AI-powered legal document analysis with multilingual support and knowledge graph construction
Quick Start • Documentation • Features • Architecture • API Reference
You can watch the special demo video showcasing Avokat AI in action here:
Demo Video Link
Avokat AI is an intelligent legal document analysis system that combines advanced PDF processing, knowledge graph construction, and multilingual AI-powered chat capabilities. Built for legal professionals, it provides grounded legal assistance by analyzing uploaded documents and creating session-isolated knowledge graphs.
- PDF Processing: High-quality text extraction using PyMuPDF
- Knowledge Graph: Neo4j-based entity and relationship extraction
- Multilingual Support: Arabic, English, and mixed-language processing
- AI Chat: Real-time streaming responses with Gemini 2.5 Flash
- Session Isolation: Complete data separation between chat sessions
- Legal Compliance: Built-in disclaimers and professional legal assistance
- Python 3.8+
- Neo4j Aura Cloud account
- Google Gemini API key
- Git
-
Clone the repository
git clone https://github.com/mohamed-rabee3/avokat-ai.git cd avokat-ai -
Create virtual environment
# Windows py -m venv venv venv\Scripts\Activate.ps1 # macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
# Copy example environment file cp .env.example .env # Edit .env with your credentials nano .env
-
Set up Neo4j Aura
- Create a Neo4j Aura Cloud instance
- Get your connection URI, username, and password
- Update
.envfile with Neo4j credentials
-
Start the backend
python -m uvicorn backend.app.main:app --reload --host 127.0.0.1 --port 8000
-
Access the API
- API Documentation: http://127.0.0.1:8000/docs
- Health Check: http://127.0.0.1:8000/health
graph LR
A[PDF Upload] --> B[Text Extraction]
B --> C[Language Detection]
C --> D[Knowledge Graph Creation]
D --> E[Entity Extraction]
E --> F[Neo4j Storage]
F --> G[Chat Ready]
| Language | Features |
|---|---|
| Arabic | Enhanced prompts, cultural context, legal terminology |
| English | Standard processing, comprehensive legal assistance |
| Mixed | Dual-language preservation, cross-language relationships |
- Entity Extraction: Legal entities, relationships, and concepts
- Semantic Search: Context-aware document retrieval
- Streaming Responses: Real-time chat with Server-Sent Events
- Citation Support: Source tracking and reference management
graph TB
subgraph "Frontend"
UI[React Application]
end
subgraph "Backend Services"
API[FastAPI Server]
PDF[PDF Processor]
KG[Knowledge Graph Builder]
LLM[LLM Service]
RET[Retrieval Service]
end
subgraph "Data Layer"
SQLITE[(SQLite)]
NEO4J[(Neo4j Aura)]
end
subgraph "External Services"
GEMINI[Gemini 2.5 Flash]
end
UI --> API
API --> PDF
API --> KG
API --> LLM
API --> RET
PDF --> NEO4J
KG --> NEO4J
LLM --> GEMINI
API --> SQLITE
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI | REST API and service orchestration |
| PDF Processing | PyMuPDF | High-quality text extraction |
| Knowledge Graph | Neo4j + LangChain | Entity and relationship storage |
| AI Engine | Gemini 2.5 Flash | Multilingual response generation |
| Embeddings | Sentence Transformers | Semantic search capabilities |
| Database | SQLite | Session and message storage |
POST /sessions # Create new session
GET /sessions # List all sessions
GET /sessions/{id} # Get session details
PUT /sessions/{id} # Update session
DELETE /sessions/{id} # Delete session and dataPOST /ingest # Upload and process PDFRequest:
curl -X POST "http://localhost:8000/ingest" \
-F "session_id=1" \
-F "file=@document.pdf"Response:
{
"status": "success",
"session_id": 1,
"file_name": "document.pdf",
"chunks": 15,
"nodes_created": 45,
"relationships_created": 32,
"session_stats": {
"Entity": 25,
"DocumentChunk": 15,
"Fact": 20
}
}POST /chat # Streaming chat
POST /chat/non-streaming # Non-streaming chat
GET /chat/history/{session_id} # Get chat historyRequest:
curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{
"session_id": 1,
"message": "What are the key terms in this contract?"
}'Response (Streaming):
data: {"chunk": "Based on the uploaded contract, I can identify several key terms..."}
data: {"chunk": "The main parties involved are..."}
data: {"done": true, "sources": [...]}
Create a .env file in the project root:
# Database Configuration
DATABASE_URL=sqlite+aiosqlite:///./avokat.db
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j
# AI Services
GEMINI_API_KEY=your-gemini-api-key
# API Configuration
API_TITLE=Avokat AI API
API_VERSION=1.0.0
DEBUG=false-
Create Neo4j Aura Instance
- Visit Neo4j Aura
- Create a new database instance
- Choose the free tier for development
-
Get Connection Details
- Copy the connection URI
- Note your username and password
- Update your
.envfile
-
Verify Connection
curl http://localhost:8000/health
- Backend & AI System Documentation - Complete technical reference
- Neo4j Aura Setup Guide - Database configuration
- Retrieval Improvements - Performance optimizations
- Legal Chatbot MVP Plan - Project specifications
- Interactive Docs: http://localhost:8000/docs (Swagger UI)
- ReDoc: http://localhost:8000/redoc
- OpenAPI Schema: http://localhost:8000/openapi.json
# Check API health
curl http://localhost:8000/health
# Test session creation
curl -X POST "http://localhost:8000/sessions" \
-H "Content-Type: application/json" \
-d '{"name": "Test Session"}'-
Create Session
curl -X POST "http://localhost:8000/sessions" \ -H "Content-Type: application/json" \ -d '{"name": "Legal Document Analysis"}'
-
Upload Document
curl -X POST "http://localhost:8000/ingest" \ -F "session_id=1" \ -F "file=@sample-contract.pdf"
-
Start Chat
curl -X POST "http://localhost:8000/chat" \ -H "Content-Type: application/json" \ -d '{ "session_id": 1, "message": "Summarize the key points of this contract" }'
# Start with auto-reload
python -m uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000# Install production dependencies
pip install gunicorn
# Start with Gunicorn
gunicorn backend.app.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "backend.app.main:app", "--host", "0.0.0.0", "--port", "8000"]We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Make your changes
- Add tests (if applicable)
- Commit your changes
git commit -m "Add amazing feature" - Push to the branch
git push origin feature/amazing-feature
- Open a Pull Request
- Follow PEP 8 style guidelines
- Add docstrings to new functions
- Update documentation for new features
- Test with both Arabic and English documents
| Operation | Average Time | Notes |
|---|---|---|
| PDF Processing | 2-5 seconds | Depends on document size |
| Knowledge Graph Creation | 30-60 seconds | Rate limited for API compliance |
| Chat Response | 2-8 seconds | Streaming response |
| Document Retrieval | <1 second | Optimized with indexes |
- Session Isolation: Efficient data separation
- Indexing: Optimized Neo4j queries
- Caching: Embedding and response caching
- Rate Limiting: API compliance and stability
Important: This system provides informational assistance only and is not a substitute for professional legal advice. All responses include appropriate legal disclaimers.
- Session Isolation: Complete data separation
- No Cross-Session Leakage: Verified isolation testing
- Secure Storage: Encrypted connections to Neo4j Aura
- API Security: CORS protection and input validation
Neo4j Connection Failed
# Check your Neo4j credentials
curl -X GET "http://localhost:8000/health"PDF Processing Error
# Ensure PyMuPDF is installed
pip install PyMuPDFGemini API Issues
# Verify API key in .env file
echo $GEMINI_API_KEY# Enable debug logging
export DEBUG=true
python -m uvicorn backend.app.main:app --reload- Multi-user Authentication
- Advanced Entity Recognition
- Document Comparison
- Export Capabilities
- Mobile App
- Additional Language Support
- v1.0.0 - Initial release with core functionality
- v1.1.0 - Enhanced multilingual support
- v1.2.0 - Performance optimizations
- v2.0.0 - Multi-user support (planned)
This project is licensed under the MIT License - see the LICENSE file for details.
- Neo4j for graph database technology
- Google for Gemini AI capabilities
- LangChain for knowledge graph construction
- FastAPI for the excellent web framework
- PyMuPDF for reliable PDF processing
- Documentation: Full Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with care for the legal community