All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Mathematical OCR Engine: Complete implementation of OCR for mathematical equations and expressions
- Vector-Based Caching: Intelligent caching using ruvector-core for image embeddings and similarity search
- Multi-Format Output: Support for LaTeX, MathML, AsciiMath, SMILES, HTML, DOCX, JSON, and MMD formats
- Image Preprocessing Pipeline: Advanced image enhancement, deskewing, rotation correction, and segmentation
- Configuration Management: Flexible TOML-based configuration with presets (default, high-accuracy, high-speed)
- REST API Implementation: Scipix v3 API compatible endpoints
/v3/text- Image OCR processing (multipart/base64/URL)/v3/strokes- Digital ink recognition/v3/pdf- Async PDF processing with job queue/v3/latex- Legacy equation recognition/v3/converter- Document format conversion/health- Health check endpoint
- Production-Ready Middleware:
- Authentication (app_id/app_key validation)
- Token bucket rate limiting (100 req/min default)
- Request tracing and structured logging
- CORS support with configurable origins
- Gzip compression for responses
- Async Job Queue: Background processing for PDF jobs with status tracking and webhook callbacks
- Result Caching: Moka-based async caching with TTL
- Graceful Shutdown: Proper resource cleanup on termination
- Browser-Based OCR: Process images directly in the browser
- Web Worker Support: Off-main-thread processing with progress reporting
- Multiple Input Formats: File, Canvas, Base64, URL support
- Optimized Bundle: <2MB compressed size with efficient memory management
- TypeScript Definitions: Full type safety for JavaScript/TypeScript projects
- Interactive Commands:
ocr- Process single or batch imagesserve- Start API serverbatch- Process multiple images in parallelconfig- Manage configuration files
- Rich Terminal UI: Progress bars, colored output, and interactive tables
- Shell Completions: Support for bash, zsh, fish, and PowerShell
- SIMD Acceleration: Vectorized operations for image processing
- Parallel Processing: Multi-threaded batch processing with rayon
- Memory Optimization: Efficient memory pooling and buffer reuse
- Quantization Support: Model quantization for reduced memory footprint
- Batch Inference: Optimized batch processing for throughput
- LaTeX Parser: Complete LaTeX to AST parsing with error recovery
- MathML Generation: AST to MathML conversion with proper semantics
- AsciiMath Support: AsciiMath parsing and conversion
- Symbol Library: Comprehensive mathematical symbol database
- Format Conversion: Convert between LaTeX, MathML, and AsciiMath
- Comprehensive Documentation: 15+ detailed documentation files covering:
- Architecture and design decisions
- OCR research and algorithms
- Rust ecosystem integration
- Testing strategies
- Security best practices
- Optimization techniques
- WASM implementation guide
- Lean/Agentic integration roadmap
- Example Programs: 7 example applications demonstrating different use cases
- Integration Tests: Comprehensive test suite with >90% coverage target
- Benchmarks: Performance benchmarks using Criterion
- Type Safety: Strong typing throughout with comprehensive error handling
- Modular Design: Clean separation of concerns with feature flags
- Feature Flags:
default- Core functionality with preprocessing, caching, and optimizationpreprocess- Image preprocessing pipelinecache- Vector-based cachingocr- OCR engine (requires ONNX models)math- Mathematical parsing and conversionoptimize- Performance optimizationswasm- WebAssembly bindings
- Core: ruvector-core, image, imageproc, serde, tokio
- ML: ort (ONNX Runtime) for model inference
- Web: axum, tower, tower-http for REST API
- CLI: clap, indicatif, console for command-line interface
- Math: nom for parsing, nalgebra for linear algebra
- Performance: rayon, memmap2, SIMD intrinsics
- Testing: criterion, proptest, mockall
- OCR Throughput: Target >100 images/second (batch mode)
- API Latency: <100ms for typical equations (cached)
- Memory Usage: <500MB baseline, <2GB peak
- Cache Hit Rate: >80% for similar equations
- WASM Bundle: <2MB compressed, <5MB uncompressed
- ONNX Models: Models not included in repository (must be downloaded separately)
- GPU Support: ONNX Runtime CPU-only (GPU support planned)
- Language Support: English and mathematical notation only
- Handwriting: Limited handwriting recognition (digital ink only)
- Complex Layouts: Advanced layout analysis planned for future releases
- Database: No persistent storage yet (planned for 0.2.0)
- Input Validation: Comprehensive validation using validator crate
- Rate Limiting: Default 100 req/min per client
- Authentication: Required for all API endpoints (except health)
- No Secrets: Environment variables for all credentials
- CORS: Configurable allowed origins
- Size Limits: Configurable max request/file sizes
None (initial release)
This is the initial release. No migration required.
- Database persistence (PostgreSQL/SQLite)
- Horizontal scaling with Redis
- Prometheus metrics
- OpenAPI/Swagger documentation
- Multi-tenancy support
- GPU acceleration via ONNX Runtime
- Advanced layout analysis
- Multi-language support
- Enhanced handwriting recognition
- Real-time collaborative editing
- Production-grade stability
- Enterprise features
- Cloud-native deployment
- Kubernetes operators
- Comprehensive monitoring
- Ruvector Team - Initial implementation and architecture
- Community - Testing and feedback
MIT License - See LICENSE file for details
- Nothing yet
- Nothing yet
- Nothing yet
- Nothing yet
- Nothing yet
- Nothing yet