Enterprise-grade document processing with AI-powered insights and modern cloud-native architecture
A production-ready, scalable document intelligence system that combines the power of Azure AI services with advanced language models to extract, analyze, and structure information from various document types. Built with modern Python frameworks and designed for cloud deployment.
- AI-Powered Processing: Azure Document Intelligence + GPT-4 for comprehensive document analysis
- Modern Architecture: FastAPI, async/await, proper dependency injection
- Production Ready: Docker containerization, CI/CD pipelines, monitoring
- Cloud Native: Optimized for Railway, Render, Vercel, and other cloud platforms
- Security First: JWT authentication, input validation, rate limiting
- Scalable: Redis caching, connection pooling, async processing
- Developer Friendly: Comprehensive testing, type hints, documentation
graph TB
A[Client Applications] --> B[Load Balancer]
B --> C[FastAPI Application]
C --> D[Document Intelligence Service]
C --> E[Authentication Service]
C --> F[WebSocket Manager]
D --> G[Azure Document Intelligence]
D --> H[Language Model Service]
D --> I[Database]
C --> J[Redis Cache]
I --> K[(PostgreSQL/SQLite)]
J --> L[(Redis)]
style C fill:#e1f5fe
style D fill:#f3e5f5
style I fill:#e8f5e8
- Python 3.11+
- Docker and Docker Compose
- Azure Account with AI Services
- Git
-
Clone the repository
git clone https://github.com/aaron-seq/Roneira-AI-LLM-powered-document-intelligence-system.git cd Roneira-AI-LLM-powered-document-intelligence-system -
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure environment variables
cp .env.example .env # Edit .env with your configuration -
Start services with Docker
docker-compose up -d redis postgres
-
Run the application
python -m uvicorn app.main:app --reload
-
Access the API
- API: http://localhost:8000
- Documentation: http://localhost:8000/api/docs
- Health Check: http://localhost:8000/health
- Connect your GitHub repository to Railway
- Set environment variables in Railway dashboard
- Deploy automatically with git push
- Fork this repository
- Connect to Render
- Configure using
deployment/render.yaml
npm i -g vercel
vercel --prod# Get access token
curl -X POST "http://localhost:8000/api/auth/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=demo&password=demo"# Upload document
curl -X POST "http://localhost:8000/api/documents/upload" \
-H "Authorization: Bearer <token>" \
-F "[email protected]"# Get document status
curl -X GET "http://localhost:8000/api/documents/{document_id}/status" \
-H "Authorization: Bearer <token>"// WebSocket connection for real-time updates
const ws = new WebSocket('ws://localhost:8000/ws/{document_id}');
ws.onmessage = function(event) {
const update = JSON.parse(event.data);
console.log('Processing update:', update);
};| Variable | Description | Default | Required |
|---|---|---|---|
ENVIRONMENT |
Runtime environment | development |
No |
SECRET_KEY |
JWT secret key | - | Yes |
DATABASE_URL |
Database connection URL | sqlite:///./documents.db |
No |
REDIS_URL |
Redis connection URL | redis://localhost:6379/0 |
No |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key | - | Yes |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint | - | Yes |
AZURE_DOCUMENT_INTELLIGENCE_KEY |
Azure Document Intelligence key | - | Yes |
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT |
Azure Document Intelligence endpoint | - | Yes |
MAX_FILE_SIZE_MB |
Maximum file size in MB | 10 |
No |
RATE_LIMIT_REQUESTS_PER_MINUTE |
API rate limit | 60 |
No |
# .env file
ENVIRONMENT=production
SECRET_KEY=your-super-secure-secret-key-here
DATABASE_URL=postgresql://user:password@host:port/database
REDIS_URL=redis://host:port/0
# Azure AI Services
AZURE_OPENAI_API_KEY=your-azure-openai-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-document-intelligence-key
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
# Application Settings
MAX_FILE_SIZE_MB=50
RATE_LIMIT_REQUESTS_PER_MINUTE=100┌── app/
│ ├── core/ # Core functionality
│ │ ├── authentication.py
│ │ ├── database_manager.py
│ │ └── exceptions.py
│ ├── services/ # Business logic
│ │ ├── document_intelligence_service.py
│ │ ├── cache_service.py
│ │ └── language_model_service.py
│ └── main.py # FastAPI application
├── tests/ # Test suite
├── deployment/ # Deployment configurations
├── .github/workflows/ # CI/CD pipelines
└── config.py # Application configuration
This project follows modern Python development practices:
# Format code
black app/ tests/
isort app/ tests/
# Lint code
flake8 app/ tests/
mypy app/
# Security check
bandit -r app/
safety check
# Run tests
pytest tests/ -v --cov=app-
Create feature branch
git checkout -b feature/your-feature-name
-
Implement feature with tests
- Add business logic in
app/services/ - Add tests in
tests/ - Update documentation
- Add business logic in
-
Ensure code quality
make lint test -
Create pull request
- CI/CD pipeline will run automatically
- Code review required for main branch
| Metric | Value |
|---|---|
| Document Processing | ~5-10 seconds avg |
| Concurrent Users | 100+ supported |
| API Response Time | <200ms (health check) |
| Memory Usage | ~200MB base |
| CPU Usage | ~10% idle, ~80% processing |
- Use Redis for caching frequently accessed data
- Implement proper connection pooling
- Monitor with application performance monitoring (APM)
- Scale horizontally with load balancers
- Authentication: JWT tokens with expiration
- Authorization: Role-based access control
- Input Validation: Comprehensive request validation
- Rate Limiting: Prevent API abuse
- File Upload Security: Type and size validation
- Secrets Management: Environment variable configuration
- Security Headers: CORS, CSP, and other security headers
- Keep dependencies updated
- Use strong secret keys (32+ characters)
- Enable HTTPS in production
- Regular security audits with
banditandsafety - Monitor logs for suspicious activity
- Application Health:
/healthendpoint - Service Dependencies: Database, Redis, Azure services
- Performance Metrics: Response times, error rates
# Structured logging example
logger.info(
"Document processed",
extra={
"document_id": document_id,
"user_id": user_id,
"processing_time": processing_time,
"status": "completed"
}
)We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
# Install development dependencies
pip install -r requirements.txt
# Install pre-commit hooks
pre-commit install
# Run tests before committing
pytest tests/ -v- Documentation: API Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
This project is licensed under the MIT License - see the LICENSE file for details.
- 🎉 Initial: Web interface and API endpoints
Made with ❤️ by Aaron Sequeira
⭐ Star this repository if you find it useful!