Document Intelligence System 1.9

Enterprise-grade document processing with AI-powered insights and modern cloud-native architecture

Overview

A production-ready, scalable document intelligence system that combines the power of Azure AI services with advanced language models to extract, analyze, and structure information from various document types. Built with modern Python frameworks and designed for cloud deployment.

Key Features

AI-Powered Processing: Azure Document Intelligence + GPT-4 for comprehensive document analysis
Modern Architecture: FastAPI, async/await, proper dependency injection
Production Ready: Docker containerization, CI/CD pipelines, monitoring
Cloud Native: Optimized for Railway, Render, Vercel, and other cloud platforms
Security First: JWT authentication, input validation, rate limiting
Scalable: Redis caching, connection pooling, async processing
Developer Friendly: Comprehensive testing, type hints, documentation

Architecture

graph TB
    A[Client Applications] --> B[Load Balancer]
    B --> C[FastAPI Application]
    C --> D[Document Intelligence Service]
    C --> E[Authentication Service]
    C --> F[WebSocket Manager]
    
    D --> G[Azure Document Intelligence]
    D --> H[Language Model Service]
    D --> I[Database]
    
    C --> J[Redis Cache]
    
    I --> K[(PostgreSQL/SQLite)]
    J --> L[(Redis)]
    
    style C fill:#e1f5fe
    style D fill:#f3e5f5
    style I fill:#e8f5e8

Quick Start

Prerequisites

Python 3.11+
Docker and Docker Compose
Azure Account with AI Services
Git

Development Setup

Clone the repository

git clone https://github.com/aaron-seq/Roneira-AI-LLM-powered-document-intelligence-system.git
cd Roneira-AI-LLM-powered-document-intelligence-system

Set up Python environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Configure environment variables

cp .env.example .env
# Edit .env with your configuration

Start services with Docker
```
docker-compose up -d redis postgres
```
Run the application
```
python -m uvicorn app.main:app --reload
```
Access the API
- API: http://localhost:8000
- Documentation: http://localhost:8000/api/docs
- Health Check: http://localhost:8000/health

Production Deployment

Deploy to Railway

Connect your GitHub repository to Railway
Set environment variables in Railway dashboard
Deploy automatically with git push

Deploy to Render

Fork this repository
Connect to Render
Configure using deployment/render.yaml

Deploy to Vercel (Serverless)

npm i -g vercel
vercel --prod

API Documentation

Authentication

# Get access token
curl -X POST "http://localhost:8000/api/auth/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "username=demo&password=demo"

Document Upload

# Upload document
curl -X POST "http://localhost:8000/api/documents/upload" \
  -H "Authorization: Bearer <token>" \
  -F "[email protected]"

Check Processing Status

# Get document status
curl -X GET "http://localhost:8000/api/documents/{document_id}/status" \
  -H "Authorization: Bearer <token>"

Real-time Updates

// WebSocket connection for real-time updates
const ws = new WebSocket('ws://localhost:8000/ws/{document_id}');

ws.onmessage = function(event) {
    const update = JSON.parse(event.data);
    console.log('Processing update:', update);
};

Configuration

Environment Variables

Variable	Description	Default	Required
`ENVIRONMENT`	Runtime environment	`development`	No
`SECRET_KEY`	JWT secret key	-	Yes
`DATABASE_URL`	Database connection URL	`sqlite:///./documents.db`	No
`REDIS_URL`	Redis connection URL	`redis://localhost:6379/0`	No
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key	-	Yes
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint	-	Yes
`AZURE_DOCUMENT_INTELLIGENCE_KEY`	Azure Document Intelligence key	-	Yes
`AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`	Azure Document Intelligence endpoint	-	Yes
`MAX_FILE_SIZE_MB`	Maximum file size in MB	`10`	No
`RATE_LIMIT_REQUESTS_PER_MINUTE`	API rate limit	`60`	No

Example Configuration

# .env file
ENVIRONMENT=production
SECRET_KEY=your-super-secure-secret-key-here
DATABASE_URL=postgresql://user:password@host:port/database
REDIS_URL=redis://host:port/0

# Azure AI Services
AZURE_OPENAI_API_KEY=your-azure-openai-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-document-intelligence-key
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/

# Application Settings
MAX_FILE_SIZE_MB=50
RATE_LIMIT_REQUESTS_PER_MINUTE=100

Development

Project Structure

┌── app/
│   ├── core/              # Core functionality
│   │   ├── authentication.py
│   │   ├── database_manager.py
│   │   └── exceptions.py
│   ├── services/          # Business logic
│   │   ├── document_intelligence_service.py
│   │   ├── cache_service.py
│   │   └── language_model_service.py
│   └── main.py            # FastAPI application
├── tests/              # Test suite
├── deployment/         # Deployment configurations
├── .github/workflows/  # CI/CD pipelines
└── config.py           # Application configuration

Code Quality

This project follows modern Python development practices:

# Format code
black app/ tests/
isort app/ tests/

# Lint code
flake8 app/ tests/
mypy app/

# Security check
bandit -r app/
safety check

# Run tests
pytest tests/ -v --cov=app

Adding New Features

Create feature branch

git checkout -b feature/your-feature-name

Implement feature with tests
- Add business logic in app/services/
- Add tests in tests/
- Update documentation
Ensure code quality
```
make lint test
```
Create pull request
- CI/CD pipeline will run automatically
- Code review required for main branch

Performance

Benchmarks

Metric	Value
Document Processing	~5-10 seconds avg
Concurrent Users	100+ supported
API Response Time	<200ms (health check)
Memory Usage	~200MB base
CPU Usage	~10% idle, ~80% processing

Optimization Tips

Use Redis for caching frequently accessed data
Implement proper connection pooling
Monitor with application performance monitoring (APM)
Scale horizontally with load balancers

Security

Security Features

Authentication: JWT tokens with expiration
Authorization: Role-based access control
Input Validation: Comprehensive request validation
Rate Limiting: Prevent API abuse
File Upload Security: Type and size validation
Secrets Management: Environment variable configuration
Security Headers: CORS, CSP, and other security headers

Security Best Practices

Keep dependencies updated
Use strong secret keys (32+ characters)
Enable HTTPS in production
Regular security audits with bandit and safety
Monitor logs for suspicious activity

Monitoring and Observability

Health Checks

Application Health: /health endpoint
Service Dependencies: Database, Redis, Azure services
Performance Metrics: Response times, error rates

Logging

# Structured logging example
logger.info(
    "Document processed",
    extra={
        "document_id": document_id,
        "user_id": user_id,
        "processing_time": processing_time,
        "status": "completed"
    }
)

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Quick Contribution Guide

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Development Environment Setup

# Install development dependencies
pip install -r requirements.txt

# Install pre-commit hooks
pre-commit install

# Run tests before committing
pytest tests/ -v

Support

Documentation: API Documentation
Issues: GitHub Issues
Discussions: GitHub Discussions

License

This project is licensed under the MIT License - see the LICENSE file for details.

🎉 Initial: Web interface and API endpoints

Made with ❤️ by Aaron Sequeira

⭐ Star this repository if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
app		app
backend		backend
deployment		deployment
frontend/src		frontend/src
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.env.free		.env.free
.gitignore		.gitignore
Apps.tsx		Apps.tsx
Config.py		Config.py
Dockerfile		Dockerfile
Dockerfile.free		Dockerfile.free
Example.env		Example.env
README-FREE.md		README-FREE.md
README.md		README.md
_.github-workflows-ci-cd.yml		_.github-workflows-ci-cd.yml
auth.py		auth.py
azure_document_intelligence.py		azure_document_intelligence.py
config.py		config.py
database.py		database.py
deploy.sh		deploy.sh
docker		docker
docker-compose.production.yml		docker-compose.production.yml
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
gunicorn.conf.py		gunicorn.conf.py
index.html		index.html
llm_service.py		llm_service.py
main.py		main.py
main_free.py		main_free.py
makefile		makefile
package-lock.json		package-lock.json
package.json		package.json
railway.json		railway.json
redis_client.py		redis_client.py
render.yaml		render.yaml
requirements-free.txt		requirements-free.txt
requirements.txt		requirements.txt
setup.ps1		setup.ps1
start-backend.ps1		start-backend.ps1
start-frontend.ps1		start-frontend.ps1
theme.ts		theme.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

aaron-seq/Roneira-AI-LLM-powered-document-intelligence-system

Folders and files

Latest commit

History

Repository files navigation