Skip to content

Latest commit

 

History

History
189 lines (150 loc) · 6.71 KB

File metadata and controls

189 lines (150 loc) · 6.71 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.1.0] - 2024-11-28

Added

Core Features

  • Mathematical OCR Engine: Complete implementation of OCR for mathematical equations and expressions
  • Vector-Based Caching: Intelligent caching using ruvector-core for image embeddings and similarity search
  • Multi-Format Output: Support for LaTeX, MathML, AsciiMath, SMILES, HTML, DOCX, JSON, and MMD formats
  • Image Preprocessing Pipeline: Advanced image enhancement, deskewing, rotation correction, and segmentation
  • Configuration Management: Flexible TOML-based configuration with presets (default, high-accuracy, high-speed)

API Server

  • REST API Implementation: Scipix v3 API compatible endpoints
    • /v3/text - Image OCR processing (multipart/base64/URL)
    • /v3/strokes - Digital ink recognition
    • /v3/pdf - Async PDF processing with job queue
    • /v3/latex - Legacy equation recognition
    • /v3/converter - Document format conversion
    • /health - Health check endpoint
  • Production-Ready Middleware:
    • Authentication (app_id/app_key validation)
    • Token bucket rate limiting (100 req/min default)
    • Request tracing and structured logging
    • CORS support with configurable origins
    • Gzip compression for responses
  • Async Job Queue: Background processing for PDF jobs with status tracking and webhook callbacks
  • Result Caching: Moka-based async caching with TTL
  • Graceful Shutdown: Proper resource cleanup on termination

WebAssembly Support

  • Browser-Based OCR: Process images directly in the browser
  • Web Worker Support: Off-main-thread processing with progress reporting
  • Multiple Input Formats: File, Canvas, Base64, URL support
  • Optimized Bundle: <2MB compressed size with efficient memory management
  • TypeScript Definitions: Full type safety for JavaScript/TypeScript projects

CLI Tool

  • Interactive Commands:
    • ocr - Process single or batch images
    • serve - Start API server
    • batch - Process multiple images in parallel
    • config - Manage configuration files
  • Rich Terminal UI: Progress bars, colored output, and interactive tables
  • Shell Completions: Support for bash, zsh, fish, and PowerShell

Performance Optimizations

  • SIMD Acceleration: Vectorized operations for image processing
  • Parallel Processing: Multi-threaded batch processing with rayon
  • Memory Optimization: Efficient memory pooling and buffer reuse
  • Quantization Support: Model quantization for reduced memory footprint
  • Batch Inference: Optimized batch processing for throughput

Math Processing

  • LaTeX Parser: Complete LaTeX to AST parsing with error recovery
  • MathML Generation: AST to MathML conversion with proper semantics
  • AsciiMath Support: AsciiMath parsing and conversion
  • Symbol Library: Comprehensive mathematical symbol database
  • Format Conversion: Convert between LaTeX, MathML, and AsciiMath

Developer Experience

  • Comprehensive Documentation: 15+ detailed documentation files covering:
    • Architecture and design decisions
    • OCR research and algorithms
    • Rust ecosystem integration
    • Testing strategies
    • Security best practices
    • Optimization techniques
    • WASM implementation guide
    • Lean/Agentic integration roadmap
  • Example Programs: 7 example applications demonstrating different use cases
  • Integration Tests: Comprehensive test suite with >90% coverage target
  • Benchmarks: Performance benchmarks using Criterion
  • Type Safety: Strong typing throughout with comprehensive error handling

Technical Details

Architecture

  • Modular Design: Clean separation of concerns with feature flags
  • Feature Flags:
    • default - Core functionality with preprocessing, caching, and optimization
    • preprocess - Image preprocessing pipeline
    • cache - Vector-based caching
    • ocr - OCR engine (requires ONNX models)
    • math - Mathematical parsing and conversion
    • optimize - Performance optimizations
    • wasm - WebAssembly bindings

Dependencies

  • Core: ruvector-core, image, imageproc, serde, tokio
  • ML: ort (ONNX Runtime) for model inference
  • Web: axum, tower, tower-http for REST API
  • CLI: clap, indicatif, console for command-line interface
  • Math: nom for parsing, nalgebra for linear algebra
  • Performance: rayon, memmap2, SIMD intrinsics
  • Testing: criterion, proptest, mockall

Performance Benchmarks

  • OCR Throughput: Target >100 images/second (batch mode)
  • API Latency: <100ms for typical equations (cached)
  • Memory Usage: <500MB baseline, <2GB peak
  • Cache Hit Rate: >80% for similar equations
  • WASM Bundle: <2MB compressed, <5MB uncompressed

Known Limitations

  • ONNX Models: Models not included in repository (must be downloaded separately)
  • GPU Support: ONNX Runtime CPU-only (GPU support planned)
  • Language Support: English and mathematical notation only
  • Handwriting: Limited handwriting recognition (digital ink only)
  • Complex Layouts: Advanced layout analysis planned for future releases
  • Database: No persistent storage yet (planned for 0.2.0)

Security

  • Input Validation: Comprehensive validation using validator crate
  • Rate Limiting: Default 100 req/min per client
  • Authentication: Required for all API endpoints (except health)
  • No Secrets: Environment variables for all credentials
  • CORS: Configurable allowed origins
  • Size Limits: Configurable max request/file sizes

Breaking Changes

None (initial release)

Migration Guide

This is the initial release. No migration required.

Future Roadmap

Version 0.2.0 (Q1 2025)

  • Database persistence (PostgreSQL/SQLite)
  • Horizontal scaling with Redis
  • Prometheus metrics
  • OpenAPI/Swagger documentation
  • Multi-tenancy support

Version 0.3.0 (Q2 2025)

  • GPU acceleration via ONNX Runtime
  • Advanced layout analysis
  • Multi-language support
  • Enhanced handwriting recognition
  • Real-time collaborative editing

Version 1.0.0 (Q3 2025)

  • Production-grade stability
  • Enterprise features
  • Cloud-native deployment
  • Kubernetes operators
  • Comprehensive monitoring

Contributors

  • Ruvector Team - Initial implementation and architecture
  • Community - Testing and feedback

License

MIT License - See LICENSE file for details


Unreleased

Added

  • Nothing yet

Changed

  • Nothing yet

Fixed

  • Nothing yet

Deprecated

  • Nothing yet

Removed

  • Nothing yet

Security

  • Nothing yet