Skip to content

Conversation

@AnishSarkar22
Copy link
Contributor

@AnishSarkar22 AnishSarkar22 commented Nov 23, 2025

Added BlockNote editor integration with comprehensive document reindexing system. This PR enables rich text editing of documents with automatic background reindexing to keep search results synchronized, plus migration support to convert existing documents to editable format.

Description

This PR adds a complete BlockNote editor integration with intelligent reindexing to SurfSense:

Frontend Changes:

  • Created BlockNoteEditor component using @blocknote/react and @blocknote/mantine with theme support
  • Implemented DynamicBlockNoteEditor wrapper for proper SSR handling
  • Updated editor page (/dashboard/[search_space_id]/editor/[documentId]) with:
    • BlockNote editor interface
    • Single "Save & Exit" button (no auto-save)
    • Unsaved changes tracking with navigation warnings
    • Loading states and error handling
    • Toast notifications for save status and reindexing progress

Backend Changes:

  • New API Endpoints:

    • GET /api/v1/documents/{document_id}/editor-content - Fetches document for editing with lazy migration support
    • POST /api/v1/documents/{document_id}/save - Saves document and triggers background reindexing
  • Database Changes:

    • Added migration 38_add_blocknote_fields_to_documents.py with three new columns:
      • blocknote_document (JSONB) - Stores BlockNote JSON structure
      • content_needs_reindexing (Boolean) - Flags documents needing reindex
      • last_edited_at (Timestamp) - Tracks last edit time
    • Migration includes idempotent column checks for safe re-running
  • Celery Background Tasks:

    • reindex_document_task - Handles post-edit reindexing:
      • Converts BlockNote JSON → Markdown
      • Recreates document chunks (for semantic search)
      • Regenerates document summary with LLM
      • Updates embeddings for accurate search results
    • populate_blocknote_for_documents_task - Migration task:
      • Automatically populates BlockNote content for existing documents
      • Reconstructs full markdown from chunks table
      • Converts to BlockNote JSON via Next.js API
      • Processes documents in batches (50 at a time)
      • Triggered automatically during migration
  • Lazy Migration Support:

    • Documents without blocknote_document are auto-converted on first access
    • Reconstructs content from chunks table
    • Converts to BlockNote format on-the-fly
    • Graceful error handling for documents without chunks
  • BlockNote Converter Utilities:

    • convert_markdown_to_blocknote() - Markdown → BlockNote JSON (via Next.js API)
    • convert_blocknote_to_markdown() - BlockNote JSON → Markdown (via Next.js API)
    • Next.js API routes handle conversion using BlockNote's parser
  • Updated Document Processors:

    • Modified file processors to explicitly set new fields to null for pre-migration simulation
    • All processors configured for future BlockNote content generation

Reindexing Architecture:

  1. User edits document in BlockNote editor
  2. Clicks "Save & Exit" → Saves BlockNote JSON immediately
  3. Background Celery task triggered automatically:
    • Converts edited BlockNote → Markdown
    • Deletes old chunks and creates new ones from edited content
    • Regenerates summary using LLM (Gemini/GPT)
    • Updates embeddings for document and chunks
  4. Search results reflect edited content without user waiting

Migration Strategy:

  • Existing documents: Automatically populated with BlockNote content during migration
  • New documents: Will support BlockNote generation in future updates
  • Documents without chunks: Gracefully handled with appropriate error messages

Motivation and Context

This change enables users to edit documents directly within SurfSense while maintaining search accuracy. Key improvements:

  1. Rich Text Editing: Modern block-based interface for document editing
  2. Search Synchronization: Automatic background reindexing ensures search results always reflect current content
  3. Performance: Non-blocking saves with background processing
  4. Migration Support: Existing documents automatically converted to editable format
  5. Data Integrity: Explicit chunk deletion prevents lazy-loading issues in async contexts

Technical Decisions:

  • Background reindexing: Prevents UI blocking during expensive operations (LLM calls, embeddings)
  • Celery for async tasks: Robust job queue with retry logic and monitoring
  • Lazy migration: Documents converted on-demand to avoid blocking deployments
  • Batch processing: Migration handles large document sets efficiently

Screenshots

SCR-20251123-omah

SCR-20251123-onwb

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR introduces a rich text editing capability to SurfSense by integrating the BlockNote editor. Users can now edit documents directly in the browser with a modern block-based interface. The implementation includes database schema changes to store BlockNote JSON documents, new API endpoints for fetching and saving editor content, and automatic conversion of markdown to BlockNote format during document processing. The frontend features a complete editor page with auto-save functionality every 30 seconds, theme support (dark/light), and unsaved changes tracking. All document processors have been updated to generate BlockNote-compatible content on upload, though legacy documents will require re-uploading to enable editing.

⏱️ Estimated Review Time: 3+ hours

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/alembic/versions/38_add_blocknote_fields_to_documents.py
2 surfsense_backend/app/db.py
3 surfsense_backend/app/utils/blocknote_converter.py
4 surfsense_backend/app/routes/editor_routes.py
5 surfsense_backend/app/routes/__init__.py
6 surfsense_backend/app/tasks/document_processors/extension_processor.py
7 surfsense_backend/app/tasks/document_processors/file_processors.py
8 surfsense_backend/app/tasks/document_processors/markdown_processor.py
9 surfsense_backend/app/tasks/document_processors/url_crawler.py
10 surfsense_backend/app/tasks/document_processors/youtube_processor.py
11 surfsense_web/app/api/convert-to-blocknote/route.ts
12 surfsense_web/app/api/convert-to-markdown/route.ts
13 surfsense_web/components/BlockNoteEditor.tsx
14 surfsense_web/components/DynamicBlockNoteEditor.tsx
15 surfsense_web/app/dashboard/[search_space_id]/editor/[documentId]/page.tsx
16 surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/RowActions.tsx
17 surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentsTableShell.tsx
18 surfsense_web/components/dashboard-breadcrumb.tsx
19 surfsense_web/messages/en.json
20 surfsense_web/messages/zh.json
21 surfsense_web/next.config.ts
22 surfsense_web/package.json
23 surfsense_web/pnpm-lock.yaml
⚠️ Inconsistent Changes Detected
File Path Warning
surfsense_web/next.config.ts Disabling React StrictMode (reactStrictMode: false) is a significant configuration change that affects the entire Next.js application beyond just the BlockNote editor feature. This could hide potential issues in other parts of the codebase and should be carefully considered.

Need help? Join our Discord

Analyze latest changes

@vercel
Copy link

vercel bot commented Nov 23, 2025

@AnishSarkar22 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@AnishSarkar22 AnishSarkar22 marked this pull request as ready for review November 23, 2025 11:03
Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 70f3381..abbaa84

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (23)

surfsense_backend/alembic/versions/38_add_blocknote_fields_to_documents.py
surfsense_backend/app/db.py
surfsense_backend/app/routes/__init__.py
surfsense_backend/app/routes/editor_routes.py
surfsense_backend/app/tasks/document_processors/extension_processor.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/app/tasks/document_processors/markdown_processor.py
surfsense_backend/app/tasks/document_processors/url_crawler.py
surfsense_backend/app/tasks/document_processors/youtube_processor.py
surfsense_backend/app/utils/blocknote_converter.py
surfsense_web/app/api/convert-to-blocknote/route.ts
surfsense_web/app/api/convert-to-markdown/route.ts
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentsTableShell.tsx
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/RowActions.tsx
surfsense_web/app/dashboard/[search_space_id]/editor/[documentId]/page.tsx
surfsense_web/components/BlockNoteEditor.tsx
surfsense_web/components/DynamicBlockNoteEditor.tsx
surfsense_web/components/dashboard-breadcrumb.tsx
surfsense_web/messages/en.json
surfsense_web/messages/zh.json
surfsense_web/next.config.ts
surfsense_web/package.json
surfsense_web/pnpm-lock.yaml

@AnishSarkar22
Copy link
Contributor Author

@MODSetter I am unsure whether we should update documents with empty blocknote_content, e.g., for those documents which were uploaded before blocknote editor gets merged to the system. How should we handle them?

@AnishSarkar22 AnishSarkar22 changed the title Feature: BlockNote editor [Feature] Add BlockNote editor Nov 23, 2025
Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on abbaa84..289b4de

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (17)

surfsense_backend/alembic/versions/38_add_blocknote_fields_to_documents.py
surfsense_backend/app/db.py
surfsense_backend/app/routes/editor_routes.py
surfsense_backend/app/tasks/document_processors/extension_processor.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/app/tasks/document_processors/markdown_processor.py
surfsense_backend/app/tasks/document_processors/url_crawler.py
surfsense_backend/app/tasks/document_processors/youtube_processor.py
surfsense_backend/app/utils/blocknote_converter.py
surfsense_web/app/api/convert-to-blocknote/route.ts
surfsense_web/app/api/convert-to-markdown/route.ts
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/RowActions.tsx
surfsense_web/app/dashboard/[search_space_id]/editor/[documentId]/page.tsx
surfsense_web/components/BlockNoteEditor.tsx
surfsense_web/components/DynamicBlockNoteEditor.tsx
surfsense_web/components/dashboard-breadcrumb.tsx
surfsense_web/next.config.ts

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 289b4de..289b4de

✨ No files to analyze

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 289b4de..e419702

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (50)

README.md
README.zh-CN.md
surfsense_backend/alembic/versions/36_remove_fk_constraints_for_global_llm_configs.py
surfsense_backend/alembic/versions/37_add_system_prompts_to_searchspaces.py
surfsense_backend/alembic/versions/38_add_blocknote_fields_to_documents.py
surfsense_backend/alembic/versions/38_add_webcrawler_connector_enum.py
surfsense_backend/alembic/versions/39_add_rbac_tables.py
surfsense_backend/alembic/versions/40_move_llm_preferences_to_searchspace.py
surfsense_backend/alembic/versions/41_backfill_rbac_for_existing_searchspaces.py
surfsense_backend/alembic/versions/42_drop_user_search_space_preferences.py
surfsense_backend/app/agents/researcher/nodes.py
surfsense_backend/app/agents/researcher/qna_agent/configuration.py
surfsense_backend/app/agents/researcher/qna_agent/default_prompts.py
surfsense_backend/app/agents/researcher/qna_agent/nodes.py
surfsense_backend/app/agents/researcher/utils.py
surfsense_backend/app/celery_app.py
surfsense_backend/app/config/__init__.py
surfsense_backend/app/connectors/webcrawler_connector.py
surfsense_backend/app/db.py
surfsense_backend/app/retriver/chunks_hybrid_search.py
surfsense_backend/app/retriver/documents_hybrid_search.py
surfsense_backend/app/routes/__init__.py
surfsense_backend/app/routes/chats_routes.py
surfsense_backend/app/routes/documents_routes.py
surfsense_backend/app/routes/editor_routes.py
surfsense_backend/app/routes/llm_config_routes.py
surfsense_backend/app/routes/logs_routes.py
surfsense_backend/app/routes/podcasts_routes.py
surfsense_backend/app/routes/rbac_routes.py
surfsense_backend/app/routes/search_source_connectors_routes.py
surfsense_backend/app/routes/search_spaces_routes.py
surfsense_backend/app/schemas/__init__.py
surfsense_backend/app/schemas/rbac_schemas.py
surfsense_backend/app/schemas/search_space.py
surfsense_backend/app/services/connector_service.py
surfsense_backend/app/services/llm_service.py
surfsense_backend/app/services/query_service.py
surfsense_backend/app/tasks/celery_tasks/blocknote_migration_tasks.py
surfsense_backend/app/tasks/celery_tasks/connector_tasks.py
surfsense_backend/app/tasks/celery_tasks/document_reindex_tasks.py
surfsense_backend/app/tasks/celery_tasks/document_tasks.py
surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py
surfsense_backend/app/tasks/connector_indexers/__init__.py
surfsense_backend/app/tasks/connector_indexers/webcrawler_indexer.py
surfsense_backend/app/tasks/document_processors/__init__.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/app/tasks/document_processors/url_crawler.py
surfsense_backend/app/utils/check_ownership.py
surfsense_backend/app/utils/periodic_scheduler.py
surfsense_backend/app/utils/rbac.py

⏭️ Files skipped (38)
  Locations  
surfsense_backend/app/utils/validators.py
surfsense_backend/pyproject.toml
surfsense_backend/uv.lock
surfsense_browser_extension/package.json
surfsense_browser_extension/pnpm-lock.yaml
surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx
surfsense_web/app/dashboard/[search_space_id]/connectors/add/webcrawler-connector/page.tsx
surfsense_web/app/dashboard/[search_space_id]/documents/webpage/page.tsx
surfsense_web/app/dashboard/[search_space_id]/editor/[documentId]/page.tsx
surfsense_web/app/dashboard/[search_space_id]/layout.tsx
surfsense_web/app/dashboard/[search_space_id]/logs/(manage)/page.tsx
surfsense_web/app/dashboard/[search_space_id]/sources/add/page.tsx
surfsense_web/app/dashboard/[search_space_id]/team/page.tsx
surfsense_web/app/dashboard/page.tsx
surfsense_web/app/invite/[invite_code]/page.tsx
surfsense_web/components/chat/ChatInputGroup.tsx
surfsense_web/components/dashboard-breadcrumb.tsx
surfsense_web/components/editConnector/types.ts
surfsense_web/components/homepage/integrations.tsx
surfsense_web/components/onboard/completion-step.tsx
surfsense_web/components/pricing/pricing-section.tsx
surfsense_web/components/settings/llm-role-manager.tsx
surfsense_web/components/sidebar/app-sidebar.tsx
surfsense_web/components/sidebar/nav-main.tsx
surfsense_web/components/sources/ConnectorsTab.tsx
surfsense_web/components/sources/connector-data.tsx
surfsense_web/content/docs/docker-installation.mdx
surfsense_web/contracts/enums/connector.ts
surfsense_web/contracts/enums/connectorIcons.tsx
surfsense_web/hooks/index.ts
surfsense_web/hooks/use-connector-edit-page.ts
surfsense_web/hooks/use-rbac.ts
surfsense_web/hooks/use-search-spaces.ts
surfsense_web/lib/connectors/utils.ts
surfsense_web/messages/en.json
surfsense_web/messages/zh.json

…s for blocknote in `file_processors.py` file
@AnishSarkar22
Copy link
Contributor Author

@MODSetter Blocknote editor is complete I think, please merge it if there are no issues.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on e419702..f92112a

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (6)

surfsense_backend/alembic/versions/43_add_blocknote_fields_to_documents.py
surfsense_backend/app/routes/__init__.py
surfsense_backend/app/routes/editor_routes.py
surfsense_backend/app/tasks/celery_tasks/blocknote_migration_tasks.py
surfsense_backend/app/tasks/celery_tasks/document_reindex_tasks.py
surfsense_backend/app/tasks/document_processors/file_processors.py

@MODSetter
Copy link
Owner

@AnishSarkar22 Awesome work man. I love it. Few issues but I can handle them from here.

@MODSetter MODSetter merged commit eefecfa into MODSetter:dev Nov 30, 2025
4 of 7 checks passed
@AnishSarkar22
Copy link
Contributor Author

AnishSarkar22 commented Dec 1, 2025

@AnishSarkar22 Awesome work man. I love it. Few issues but I can handle them from here.

Thank you. Let me know if you run into anything, I am always happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants