🦀 e-harold-krabs

"Your optimized AI finance assistant — turning PDFs into lightning-fast insights."

e-harold-krabs is a performance-optimized AI-powered personal finance assistant. It parses your bank statements (PDF/CSV), extracts transactions, stores them efficiently in PostgreSQL, and generates high-performance dashboards & insights powered by AI.

🚀 Features

📄 Smart Document Processing

Upload PDFs or CSV files with intelligent parsing
Extract structured transactions (date, description, amount, category)
AI-powered duplicate detection and data validation

🗄️ Optimized Data Storage

PostgreSQL with optimized indexes for fast queries
JSONB for flexible transaction metadata
75% reduction in database overhead through index optimization

🤖 AI-Powered Analytics

Auto-categorization using LLMs (Ollama, OpenAI, Azure OpenAI)
Recurring expense detection with pattern recognition
Anomaly detection for unusual spending patterns
Expense forecasting using Prophet time series analysis

📊 High-Performance Dashboard

3-5x faster loading through concurrent API calls
Lazy loading for advanced analytics
Auto-refresh every 2 minutes with smart caching
Responsive multi-column layout with progressive disclosure
Real-time insights with tiered caching strategy

� Advanced Insights

Monthly spending trends and category analysis
Recurring payment tracking and forecasting
Financial anomaly alerts
Export capabilities (CSV/Excel) with smart caching

📊 Performance Optimizations

Database Layer

Optimized indexes: Removed redundant single-column indexes
Smart composite indexes: Only essential indexes for actual query patterns
Faster inserts: Reduced index maintenance overhead

Dashboard Layer

Concurrent API calls: ThreadPoolExecutor for parallel data fetching
Tiered caching: 5min for core data, 10min for advanced analytics
Progressive loading: Critical data loads first, advanced features on-demand
Smart refresh: 2-minute auto-refresh with countdown timer

Resource Efficiency

75% fewer automatic refreshes: Optimized from 30s to 2min intervals
Reduced API load: Intelligent caching and batch requests
Better UX: Non-blocking progressive disclosure

📂 Project Structure

e-harold-krabs/
│
├── app/                     # Core application
│   ├── main.py               # Entry point (FastAPI)
│   ├── config.py             # Settings (DB connection, API keys)
│   ├── services/             # Business logic
│   │   ├── pdf_parser.py     # Extract text from PDFs
│   │   ├── csv_parser.py     # Load CSV files
│   │   ├── ai_parser.py      # Call Ollama/OpenAI → JSON
│   │   ├── categorizer.py    # Assign categories (AI + rules)
│   │   ├── insights.py       # Totals, trends, forecasts
│   │
│   ├── db/
│   │   ├── models.py         # SQLAlchemy models
│   │   ├── crud.py           # Insert/query helpers
│   │
│   └── api/
│       ├── routes.py         # Endpoints for upload/insights
│
├── dashboard/                # Frontend
│   ├── streamlit_app.py      # Streamlit dashboards
│
├── tests/                    # Unit tests
│
├── docs/                     # Documentation
│   ├── ARCHITECTURE.md
│   ├── REQUIREMENTS.md
│   ├── STRUCTURE.md
│
├── requirements.txt          # Python dependencies
├── docker-compose.yml        # Docker setup
└── README.md

🛠️ Tech Stack

Core Technologies

Backend: FastAPI with optimized async endpoints
Database: PostgreSQL 16 with optimized indexes and JSONB
AI Layer: Azure OpenAI, Ollama (local LLM), OpenAI
Dashboard: Streamlit with performance optimizations
Analytics: Prophet (forecasting), scikit-learn (anomaly detection)

Performance & Infrastructure

Caching: Multi-tier caching strategy (5min/10min TTL)
Concurrency: ThreadPoolExecutor for parallel API calls
Containerization: Docker Compose with health checks
Data Processing: Pandas, PyPDF2, pdfplumber

Development & Operations

Code Quality: Type hints, comprehensive error handling
Testing: Unit tests with pytest
Monitoring: Structured logging with security filters
Documentation: Comprehensive API and architecture docs

🧑‍💻 Getting Started

🚀 Quick Start (Recommended)

# Clone the repo
git clone https://github.com/simpat-jesus/e-harold-krabs.git
cd e-harold-krabs

# Start all services with Docker (includes database)
docker-compose up --build

Services will be available at:

🌐 Dashboard: http://localhost:8501
🔌 API: http://localhost:8000
🗄️ Database: localhost:5432

📊 Performance Optimization Setup

Apply database optimizations:

# Run the index optimization script
python migrate_indexes.py

# Restart services to apply changes
docker-compose restart

🎯 Upload Test Data

# Generate and upload sample bank statements
python generate_test_pdfs.py
./upload_pdfs.sh

🛠️ Manual Development Setup

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your configuration

# Start PostgreSQL (using Docker)
docker-compose up -d db

# Start API server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Start dashboard (in another terminal)
streamlit run dashboard/streamlit_app.py --server.port 8501

🔧 Configuration

Environment Variables:

DATABASE_URL=postgresql+psycopg2://finance_user:finance_pass@localhost:5432/finance
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
OLLAMA_BASE_URL=http://localhost:11434  # For local LLM

📝 Using the Application

Access Dashboard: Open http://localhost:8501
Upload Documents: Use the upload interface or API endpoints
View Analytics: Explore spending patterns, categories, and trends
Advanced Features: Enable "Show Advanced Analytics" for forecasting and anomaly detection
Auto-refresh: Enable 2-minute auto-refresh for real-time monitoring

📈 Performance Benchmarks

Load Time Improvements

Before Optimization: 3-5 seconds initial load
After Optimization: 1-2 seconds initial load
Improvement: 60-75% faster loading

Resource Efficiency

Database Queries: 75% reduction in index overhead
API Calls: Concurrent processing reduces wait time by 3-5x
Cache Hit Ratio: 85%+ for frequently accessed data
Memory Usage: 40% reduction through optimized data structures

📂 Project Structure

e-harold-krabs/
│
├── app/                      # Optimized core application
│   ├── main.py               # FastAPI with async optimizations
│   ├── config.py             # Database connection with retry logic
│   ├── services/             # Business logic services
│   │   ├── pdf_parser.py     # Enhanced PDF text extraction
│   │   ├── csv_parser.py     # Optimized CSV processing
│   │   ├── ai_parser.py      # Multi-provider AI integration
│   │   ├── categorizer.py    # Smart categorization engine
│   │   └── insights.py       # Advanced analytics (forecasting, anomalies)
│   │
│   ├── db/
│   │   ├── models.py         # Optimized SQLAlchemy models
│   │   └── crud.py           # Efficient database operations
│   │
│   ├── api/
│   │   └── routes.py         # RESTful API endpoints
│   │
│   └── utils/
│       └── secure_logging.py # Security-filtered logging
│
├── dashboard/                # High-performance frontend
│   ├── streamlit_app.py      # Optimized Streamlit dashboard
│   └── .streamlit/
│       └── config.toml       # Performance configuration
│
├── docs/                     # Updated documentation
│   ├── ARCHITECTURE.md       # System architecture
│   ├── REQUIREMENTS.md       # Feature requirements
│   ├── STRUCTURE.md          # Project structure
│   └── DATA_SECURITY.md      # Security guidelines
│
├── tests/                    # Comprehensive test suite
├── migrate_indexes.py        # Database optimization script
├── OPTIMIZATION_SUMMARY.md   # Performance optimization details
├── requirements.txt          # Python dependencies
├── docker-compose.yml        # Containerized deployment
└── README.md                 # This file

📈 Development Roadmap

✅ Phase 1 – Optimized Foundations (COMPLETED)

Upload PDFs & CSVs with intelligent parsing
Extract transactions into structured JSON
Store in PostgreSQL with optimized indexes
Performance: 75% reduction in database overhead

✅ Phase 2 – AI & Analytics Layer (COMPLETED)

Auto-categorize expenses with multi-provider AI support
Detect recurring payments with pattern recognition
Anomaly detection for unusual spending patterns
Performance: 3-5x faster dashboard loading

✅ Phase 3 – High-Performance Dashboards (COMPLETED)

Optimized Streamlit dashboards with lazy loading
Expense forecasting using Prophet time series
Advanced analytics with progressive disclosure
Performance: 2-minute smart auto-refresh

🔄 Phase 4 – Advanced Features (IN PROGRESS)

Real-time transaction monitoring
Custom budget alerts and notifications
Advanced financial health scoring
Export automation and scheduling

🎯 Phase 5 – Enterprise Features (PLANNED)

Multi-user support with role-based access
Advanced security features and audit logs
API rate limiting and monitoring
Scalable architecture with microservices

🤝 Contributing

We welcome contributions! Here's how you can help:

Development Setup

# Fork the repository and clone your fork
git clone https://github.com/your-username/e-harold-krabs.git
cd e-harold-krabs

# Create a development branch
git checkout -b feature/your-feature-name

# Set up development environment
docker-compose up -d db  # Start database
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run tests
pytest tests/

# Start development servers
uvicorn app.main:app --reload &
streamlit run dashboard/streamlit_app.py

Contribution Guidelines

Follow PEP 8 style guidelines
Add type hints for new functions
Include unit tests for new features
Update documentation for significant changes
Test performance impact of database changes

Areas for Contribution

🔍 Analytics: New insight algorithms and visualizations
🚀 Performance: Further optimization opportunities
🔒 Security: Enhanced data protection features
📱 UI/UX: Dashboard improvements and responsive design
🤖 AI: New AI providers and categorization models

📚 Documentation

Architecture Guide - System design and components
Performance Optimization - Detailed optimization guide
API Documentation - Interactive API docs (when running)
Data Security - Security guidelines and best practices

🐛 Troubleshooting

Common Issues

Dashboard won't load:

# Check container status
docker-compose ps

# Restart dashboard
docker-compose restart dashboard

# Check logs
docker-compose logs dashboard

Database connection errors:

# Reset database
docker-compose down -v
docker-compose up -d db
# Wait for health check, then restart API
docker-compose up api dashboard

Performance issues:

# Apply database optimizations
python migrate_indexes.py
docker-compose restart

# Clear cache
# Access dashboard and use "Refresh Now" button

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💡 Inspiration

The name comes from Mr. Krabs (SpongeBob), because this app is optimized to be crabby about your money 🦀💰⚡

"I can smell a penny from a mile away... and now I can analyze it too!" - Mr. Krabs (probably)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
dashboard		dashboard
docs		docs
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AZURE_AI_SETUP.md		AZURE_AI_SETUP.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
generate_test_pdfs.py		generate_test_pdfs.py
postgresql.conf		postgresql.conf
requirements.txt		requirements.txt
run.ps1		run.ps1
run.sh		run.sh
upload_pdfs.sh		upload_pdfs.sh

Folders and files

Latest commit

History

Repository files navigation

🦀 e-harold-krabs

🚀 Features

📄 Smart Document Processing

🗄️ Optimized Data Storage

🤖 AI-Powered Analytics

📊 High-Performance Dashboard

� Advanced Insights

📊 Performance Optimizations

Database Layer

Dashboard Layer

Resource Efficiency

📂 Project Structure

🛠️ Tech Stack

Core Technologies

Performance & Infrastructure

Development & Operations

🧑‍💻 Getting Started

🚀 Quick Start (Recommended)

📊 Performance Optimization Setup

🎯 Upload Test Data

🛠️ Manual Development Setup

🔧 Configuration

📝 Using the Application

📈 Performance Benchmarks

Load Time Improvements

Resource Efficiency

📂 Project Structure

📈 Development Roadmap

✅ Phase 1 – Optimized Foundations (COMPLETED)

✅ Phase 2 – AI & Analytics Layer (COMPLETED)

✅ Phase 3 – High-Performance Dashboards (COMPLETED)

🔄 Phase 4 – Advanced Features (IN PROGRESS)

🎯 Phase 5 – Enterprise Features (PLANNED)

🤝 Contributing

Development Setup

Contribution Guidelines

Areas for Contribution

📚 Documentation

🐛 Troubleshooting

Common Issues

📄 License

💡 Inspiration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages