-
-
Notifications
You must be signed in to change notification settings - Fork 341
Enhanced Ingest System
CWA's Enhanced Ingest System provides robust, reliable, and intelligent book processing with enterprise-grade features including timeout protection, intelligent queuing, and comprehensive status tracking.
- Quick Start
- How It Works
- Key Features
- Configuration
- Status Monitoring
- Troubleshooting
- Advanced Configuration
- File Flow Diagram
-
Add Books: Simply drop your ebook files into the
/cwa-book-ingestfolder (or whatever you've mapped it to in your docker-compose) - Wait for Processing: CWA will automatically detect, process, and add the books to your library
- Check Status: Monitor progress through the CWA web interface or check the logs
CWA's web interface upload system seamlessly integrates with the enhanced ingest system:
- Navigate to any page in the CWA web interface
- Use the "Upload" button in the top navigation
- Select your ebook files
- Click "Upload"
- Files are processed through the same enhanced ingest pipeline
- Go to the Edit Book page for any book
- Use the "Upload Format" button
- Select the new format file
- The system automatically adds it as an additional format to the existing book
🌐 Web Upload → 📁 Atomic Save → 🔄 Ingest Pipeline → 📚 Library
↓
📋 Manifest (for format uploads)
Behind the Scenes:
- Files are saved with unique timestamped names to prevent conflicts
- Atomic operations ensure no partial uploads are processed
- Manifest system handles special actions (like adding formats to existing books)
- Same processing pipeline as folder-based ingest for consistency
The ingest system supports 27+ ebook formats:
- Common: EPUB, MOBI, AZW3, PDF, TXT
- Comics: CBZ, CBR, CB7, CBC
- Documents: DOCX, ODT, HTML, HTMLZ
- Specialized: KEPUB, FB2, LIT, LRF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXTZ
- Audio: M4B, M4A, MP4 (audiobooks)
- Metadata: CWA.JSON (custom metadata files)
📁 Ingest Folder → 🔍 Detection → ⏱️ Stability Check → 🔒 Lock Check → 📚 Processing → ✅ Library
🌐 Web Upload ↗ ↓
📋 Queue (if busy)
↓
🔄 Retry Later
The enhanced ingest system supports multiple input methods:
-
Folder-Based Ingest (Traditional)
- Drop files into the ingest folder
- Real-time detection with inotifywait
- Bulk processing support
-
Web Interface Upload (Modern)
- Upload through CWA web interface
- Atomic file operations
- Manifest-driven special actions
- Progress tracking in Tasks view
- File Detection: Files are detected either by folder monitoring or web upload completion
- Stability Check: Ensures files are completely uploaded before processing
- Format Validation: Checks if the file format is supported and not in the ignore list
- Lock Management: Uses a lock file to prevent concurrent processing
- Processing: Converts, fixes, and adds the book to your Calibre library
- Cleanup: Removes processed files and updates status
-
Atomic Operations: Files are saved with
.uploadingextension, then atomically renamed -
Unique Naming: Timestamped filenames prevent conflicts (
YYYYMMDD_HHMMSS_microseconds_filename) -
Manifest System: Special
.cwa.jsonfiles instruct the processor for specific actions - User Context: Upload tracking tied to specific users for better task management
- Coordinated Timeouts: Processor handles internal timeout logic while service provides safety timeout (3x configured)
- Automatic Timeout: Prevents hanging processes with configurable timeout (default: 15 minutes)
- Failed File Backup: Problematic files are automatically backed up with timestamps
- Service Continuity: Ensures the ingest service never gets stuck
- Robust Locking: Advanced ProcessLock class with PID tracking and stale lock detection
- Automatic Recovery: System automatically cleans up orphaned processes and stale locks
- Race Condition Prevention: Proper file locking prevents multiple processors from running simultaneously
- Container Restart Safety: Process recovery service ensures clean state after container restarts
- Busy Handling: Files that can't be processed immediately are queued for retry
- Persistent Queue: Queue survives container restarts and updates
- Size Management: Configurable maximum queue size with FIFO overflow handling
- Automatic Retry: Queued files are automatically retried after successful processing
-
Real-time Status: Current processing status available at
/config/cwa_ingest_status - Detailed Information: Shows current file, timestamp, and processing state
- Queue Monitoring: Track how many files are waiting in the retry queue
- Timestamped Backups: Failed files are saved with timestamps for investigation
- Detailed Logging: Comprehensive logging for troubleshooting
- Graceful Degradation: System continues operating even when individual files fail
- Database Connection Safety: All database operations use proper context managers to prevent leaks
- Permission Error Recovery: Graceful handling of permission errors in network share environments
-
Smart Actions: Special
.cwa.jsonfiles enable advanced processing instructions - Format Addition: Add new formats to existing books without creating duplicates
- User Attribution: Web uploads are tracked and attributed to specific users
- Atomic Operations: Manifest and file operations are coordinated to prevent inconsistencies
{
"action": "add_format",
"book_id": 123,
"original_filename": "my-book.epub"
}Supported Actions:
-
add_format- Add a new format to an existing book by ID
- Navigate to Admin Panel → CWA Settings
- Find the "Ingest Processing Timeout" section
- Set your desired timeout in minutes (5-120 range)
- Click Save Settings
# In your docker-compose.yml or environment
CWA_INGEST_MAX_QUEUE_SIZE=50 # Maximum files in retry queue (default: 50)The timeout setting is stored in your CWA database:
-- View current timeout setting
SELECT ingest_timeout_minutes FROM cwa_settings;
-- Manually update timeout (not recommended, use web interface)
UPDATE cwa_settings SET ingest_timeout_minutes = 20;Configure which formats to ignore through the CWA Settings page:
- Auto-Convert Ignored Formats: Formats to skip during conversion
- Auto-Ingest Ignored Formats: Formats to completely ignore during ingest
The status file (/config/cwa_ingest_status) contains:
state:filename:timestamp:detail
Possible States:
-
idle- Service is waiting for new files -
processing:filename:timestamp- Currently processing a file -
queued:filename:timestamp- File added to retry queue -
completed:filename:timestamp- File successfully processed -
timeout:filename:timestamp- File processing timed out (internal processor timeout) -
safety_timeout:filename:timestamp- File hit safety timeout (indicates serious issue) -
error:filename:code:timestamp- Processing error occurred
# Example: Read ingest status in Python
def get_ingest_status():
try:
with open('/config/cwa_ingest_status', 'r') as f:
status_line = f.read().strip()
parts = status_line.split(':')
return {
'state': parts[0],
'filename': parts[1] if len(parts) > 1 else '',
'timestamp': parts[2] if len(parts) > 2 else '',
'detail': parts[3] if len(parts) > 3 else ''
}
except FileNotFoundError:
return {'state': 'unknown'}# Check queue size
wc -l /config/cwa_ingest_retry_queue
# View queued files
cat /config/cwa_ingest_retry_queue- Check File Format: Ensure the file format is supported
- Check Ignore Lists: Verify the format isn't in the ignore list
- Check Permissions: Ensure files are owned by the container user, not root
-
Check Status: Look at
/config/cwa_ingest_statusfor current state - Web Upload Issues: Check the Tasks page for upload status and errors
- Upload Fails to Start: Check browser console for JavaScript errors
- Upload Hangs: Verify file size isn't exceeding server limits
- Format Not Added: For existing books, ensure the book ID is valid
-
Manifest Errors: Check logs for
.cwa.jsonprocessing errors
- Check File Size: Very large files may need longer timeout
- Increase Timeout: Adjust timeout in CWA Settings
-
Check Failed Backups: Look in
/config/processed_books/failed/for problematic files - Check Logs: Review container logs for detailed error information
- Web Upload Timeouts: Large uploads may need increased web server timeout settings
-
Check Lock Status: Verify no stale lock files exist in
/tmp/ - Container Restart: The process recovery service automatically cleans up on restart
-
Manual Cleanup: If needed, remove lock files:
rm -f /tmp/ingest_processor.lock -
Check Orphaned Processes: Look for hung Python processes:
ps aux | grep ingest_processor
-
Network Shares: Set
NETWORK_SHARE_MODE=truefor NFS/SMB environments - File Ownership: Ensure files are owned by container user (abc:abc)
- Directory Permissions: Verify ingest directory is writable
-
Failed Backup Location: Check
/config/processed_books/failed/for permission error backups
Failed files are saved with descriptive timestamps:
/config/processed_books/failed/
├── 20250902_143052_timeout_large-book.epub
├── 20250902_143127_retry_timeout_corrupted-file.pdf
├── 20250902_143200_safety_timeout_problematic-book.mobi
└── 20250902_143301_permission_error_network-file.epub
Filename Format:
-
YYYYMMDD_HHMMSS_timeout_originalname.ext- Files that timed out on first attempt (processor timeout) -
YYYYMMDD_HHMMSS_retry_timeout_originalname.ext- Files that timed out during retry -
YYYYMMDD_HHMMSS_safety_timeout_originalname.ext- Files that hit the safety timeout (serious issue) -
YYYYMMDD_HHMMSS_permission_error_originalname.ext- Files with permission/access issues
# Monitor ingest service in real-time
docker logs -f calibre-web-automated | grep "cwa-ingest-service"
# Check for specific patterns
docker logs calibre-web-automated | grep -E "(TIMEOUT|ERROR|queue)"The Enhanced Ingest System includes automatic recovery mechanisms:
-
Process Recovery Service: Automatically runs on container startup to:
- Clean up stale temporary files older than 1 hour
- Reset stuck processing status
- Identify orphaned CWA processes
-
Lock File Management: Robust locking system that:
- Tracks process IDs to detect stale locks
- Automatically cleans up locks from dead processes
- Prevents race conditions in concurrent access
-
Database Connection Safety: All database operations use context managers to prevent connection leaks
-
Coordinated Timeout System:
- Processor handles internal timeout logic (configurable timeout)
- Service provides safety timeout (3x configured timeout) as last resort
- Prevents conflicts between different timeout mechanisms
For network shares (NFS/SMB), the system automatically falls back to polling mode:
# Force network share mode
NETWORK_SHARE_MODE=true
# Force polling mode
CWA_WATCH_MODE=pollFine-tune file stability detection:
# Number of size checks to perform
CWA_INGEST_STABLE_CHECKS=6
# Number of consecutive matching sizes required
CWA_INGEST_STABLE_CONSEC_MATCH=2
# Interval between checks (seconds)
CWA_INGEST_STABLE_INTERVAL=0.5# Increase queue size for high-volume ingestion
CWA_INGEST_MAX_QUEUE_SIZE=100
# Reduce timeout for faster throughput (if files are small)
# Set via web interface: 5-10 minutes# Increase timeout for large files
# Set via web interface: 30-60 minutes
# Keep default queue size
CWA_INGEST_MAX_QUEUE_SIZE=50graph TD
A1[📁 File Added to Ingest Folder] --> B[🔍 File Detected by inotifywait]
A2[🌐 Web Interface Upload] --> B2[💾 Atomic Save with .uploading]
B2 --> B3[🔄 Atomic Rename to Final Name]
B3 --> B
B --> C[⏱️ Wait for File Stability]
C --> D{📝 Valid Format?}
D -->|No| E[🗑️ Ignore File]
D -->|Yes| F{📄 Has Manifest?}
F -->|Yes| F1[📋 Process Manifest Action]
F1 --> F2{🎯 Add Format to Existing Book?}
F2 -->|Yes| F3[➕ Add Format to Book ID]
F2 -->|No| G
F3 --> J
F -->|No| G{🔒 Processor Available?}
G -->|Yes| H[⚡ Start Processing with Timeout]
G -->|No| I[📋 Add to Retry Queue]
H --> K{⏰ Processing Complete?}
K -->|Success| J[✅ File Processed Successfully]
K -->|Timeout| L[⚠️ Move to Failed Backup]
K -->|Error| M[❌ Log Error]
J --> N[🔄 Process Retry Queue]
N --> O{📋 Queue Empty?}
O -->|No| P[🔄 Retry Next File]
O -->|Yes| Q[😴 Return to Idle]
I --> R[⏳ Wait for Processor Available]
R --> P
P --> K
L --> Q
M --> Q
style A1 fill:#e1f5fe
style A2 fill:#f3e5f5
style J fill:#e8f5e8
style L fill:#fff3e0
style M fill:#ffebee
style F1 fill:#f3e5f5
- Don't download directly to ingest folder - Complete downloads elsewhere first, then move
- Use proper file permissions - Ensure files are owned by your user, not root
- Monitor disk space - Failed backups and queue files require storage
- Web uploads are atomic - No need to worry about partial uploads being processed
- Unique filenames - Web uploads automatically get unique names to prevent conflicts
- Batch processing - Add multiple files at once rather than one-by-one
-
Monitor queue size - Adjust
CWA_INGEST_MAX_QUEUE_SIZEbased on your usage - Tune timeout - Set appropriate timeout based on your typical file sizes
- Web upload efficiency - Use web interface for single files, folder ingest for bulk operations
- Format management - Use "Add Format" feature instead of re-uploading entire books
- Check status regularly - Use the status file to monitor processing
- Review failed files - Investigate files in the failed backup folder
- Monitor logs - Watch container logs for processing issues
The Enhanced Ingest System works seamlessly with:
- Auto-Convert: Automatic format conversion during processing
- EPUB Fixer: Automatic EPUB repair and optimization
- Metadata Enforcement: Automatic metadata and cover enforcement
- Backup System: Automatic backup of processed files
- Stats Tracking: Processing statistics in CWA Stats page
This enhanced system provides enterprise-grade reliability while maintaining the simplicity and ease-of-use that makes CWA great for home users.