Open Notebook serves as your central hub for research materials, supporting a wide variety of content formats. This guide covers everything you need to know about adding, managing, and organizing sources in your notebooks.
Open Notebook leverages the powerful content-core library to process various content types with intelligent engine selection.
- PDF - Research papers, reports, books
- EPUB - E-books and digital publications
- Microsoft Office:
- Word documents (.docx, .doc)
- PowerPoint presentations (.pptx, .ppt)
- Excel spreadsheets (.xlsx, .xls)
- Text files - Plain text (.txt), Markdown (.md)
- HTML - Web pages and HTML files
- Video formats:
- MP4, AVI, MOV, WMV
- Automatic transcription to text
- Audio formats:
- MP3, WAV, M4A, AAC
- Speech-to-text conversion
- URLs - Any web page, blog post, or article
- YouTube videos - Automatic transcript extraction
- News articles - Automatic content extraction
- JPG, PNG, TIFF - With OCR text recognition
- Screenshots - Perfect for capturing visual information
- ZIP, TAR, GZ - Compressed file support
- Navigate to your notebook
- Click "Add Source"
- Select "Link" option
- Enter the URL in the text field
- Configure options (see Configuration Options below)
- Click "Process"
Examples:
- Research articles:
https://arxiv.org/abs/2301.00001 - YouTube videos:
https://www.youtube.com/watch?v=dQw4w9WgXcQ - News articles:
https://example.com/article - Blog posts:
https://blog.example.com/post
- Navigate to your notebook
- Click "Add Source"
- Select "Upload" option
- Click "Choose File" and select your document
- Configure options (see Configuration Options below)
- Click "Process"
Supported formats:
- Documents: PDF, DOCX, PPTX, XLSX, EPUB, TXT, MD
- Media: MP4, MP3, WAV, M4A (requires speech-to-text model)
- Images: JPG, PNG, TIFF (with OCR)
- Archives: ZIP, TAR, GZ
- Navigate to your notebook
- Click "Add Source"
- Select "Text" option
- Paste or type your content in the text area
- Configure options (see Configuration Options below)
- Click "Process"
Use cases:
- Meeting notes or transcripts
- Research findings
- Interview transcripts
- Code snippets or documentation
Apply AI-powered transformations to extract insights from your sources:
- Summary - Generate concise summaries
- Key Points - Extract main ideas and takeaways
- Questions - Generate questions for further research
- Analysis - Provide detailed analysis of content
- Custom transformations - Create your own prompts
Choose how content should be embedded for vector search:
- Ask every time - Prompt for each source
- Always embed - Automatically embed all sources
- Never embed - Skip embedding (can be done later)
Note: Embedding enables AI-powered search and context retrieval but uses tokens from your AI provider.
- Delete after processing - Remove uploaded files from server after processing
- Keep files - Retain files on server (useful for archival)
Click the "Expand" button on any source to view:
- Full extracted content
- Generated insights (transformations)
- Processing metadata
- Embedded chunk information
Control how sources are included in AI conversations:
- 🚫 Not in Context - Exclude from AI context
- 📄 Summary - Include summary only (recommended)
- 📋 Full Content - Include complete content (uses more tokens)
Each source includes:
- Title - Extracted or custom title
- Topics - Automatically detected or manually added tags
- Created/Updated - Timestamps for tracking
- Embedded chunks - Number of vector embeddings
- Insights count - Number of generated insights
Use the search functionality to find specific sources:
- Text search - Search titles and content
- Vector search - Semantic similarity search
- Filter by notebook - View sources from specific notebooks
- Filter by type - URLs, uploads, or text content
Open Notebook uses intelligent engine selection:
- Docling - PDF and Office documents (default)
- PyMuPDF - Lightweight PDF processing
- Firecrawl - Enhanced web scraping
- Jina - Advanced content extraction
- BeautifulSoup - Standard web scraping
- Upload/URL submission - Source is received
- Engine selection - Best extraction method chosen
- Content extraction - Text and metadata extracted
- Transformation application - AI insights generated
- Embedding creation - Vector embeddings for search
- Storage - Content saved to database
For audio and video files:
- Audio extraction - Video converted to audio
- Transcription - Speech converted to text
- Content processing - Standard text processing applied
Requirements:
- Speech-to-text model configured (OpenAI Whisper, etc.)
- Compatible audio/video format
- Use descriptive titles - Edit auto-generated titles for clarity
- Add relevant topics - Tag sources for better categorization
- Group related sources - Keep related materials in same notebook
- Regular cleanup - Remove outdated or irrelevant sources
- Selective embedding - Only embed sources you'll search
- Context management - Use summary context when possible
- Batch processing - Add multiple sources at once
- File cleanup - Enable automatic file deletion
- Monitor token usage - Track embedding and transformation costs
- Use summary context - Reduce token consumption in conversations
- Selective transformations - Only apply needed transformations
- Provider selection - Choose cost-effective AI providers
- Maximum upload size - Depends on server configuration
- Processing time - Large files take longer to process
- Memory usage - Very large files may cause processing issues
- Scanned PDFs - May require OCR processing
- Password-protected files - Cannot be processed
- Corrupted files - Will fail processing gracefully
- Proprietary formats - Some formats may not be supported
- YouTube transcripts - Configurable preferred languages
- Multi-language content - Supported by AI models
- OCR accuracy - Varies by image quality and language
- File storage - Temporary files deleted after processing
- Content persistence - Extracted text stored in database
- AI processing - Content sent to configured AI providers
- Access control - Password protection available
Solution:
- Check the supported formats list above
- Ensure file is not corrupted
- Try converting to a supported format
Solution:
- Verify video has captions/subtitles
- Check YouTube transcript language preferences
- Try manually uploading audio if available
Solution:
- Ensure file is not password-protected
- Check file size (try smaller files)
- Verify file is not corrupted
- Try different processing engine in settings
Solution:
- Configure speech-to-text model in Models
- Ensure provider API keys are set
- Check model availability
Solution:
- Check embedding model configuration
- Verify API key and quota limits
- Try processing without embedding first
- Check content length (very long content may fail)
- Check server logs - Enable debug logging for detailed error info
- GitHub Issues - Report bugs or request features
- Discord Community - Get help from other users
- Documentation - Review setup and configuration guides
Create your own AI-powered transformations:
- Navigate to Settings → Transformations
- Click "Create New"
- Define your prompt template
- Set default application preferences
- Test with sample content
- Multiple file upload - Select multiple files at once
- Batch transformations - Apply to multiple sources
- Bulk embedding - Process multiple sources for search
Use the REST API for programmatic source management:
- Create sources -
POST /api/sources - List sources -
GET /api/sources - Get source details -
GET /api/sources/{id} - Update source -
PUT /api/sources/{id} - Delete source -
DELETE /api/sources/{id}
- Auto-embedding - Configure default embedding behavior
- Default transformations - Apply specific transformations to all sources
- File cleanup - Automatic deletion of temporary files
- Regular processing - Schedule source updates
- Add research papers (PDF uploads)
- Include relevant articles (URL links)
- Add meeting notes (text content)
- Apply analysis transformation to extract insights
- Enable embedding for cross-source search
- Use summary context for efficient AI conversations
- Gather reference materials (mixed formats)
- Apply summary transformations for quick overviews
- Extract key points for outline creation
- Use full content context for detailed writing
- Search across sources for specific information
- Upload course materials (PDFs, videos)
- Add supplementary articles (web links)
- Create study notes (text content)
- Apply question generation for self-testing
- Use vector search for concept lookup
- Generate summaries for review
This comprehensive sources guide should help you make the most of Open Notebook's powerful content processing capabilities. Remember to experiment with different configurations to find the workflow that works best for your specific use case.