Skip to content

Latest commit

 

History

History
156 lines (115 loc) · 4.99 KB

File metadata and controls

156 lines (115 loc) · 4.99 KB

Obsidian Inbox Processor - Refactoring Plan

Overview

This document tracks the codebase improvements identified during the code review. The goal is to improve maintainability while keeping the app working exactly as it does now.


Action Items

Phase 1: Immediate (Low Risk)

  • 1. Extract duplicate insert_at_cursor() to core/shared.py

    • Found in: actions.py:92-99 and formatting.py:47-54
  • 2. Extract duplicate _ensure_trailing_newline() to core/shared.py

    • Found in: actions.py:86-89 and formatting.py:57-60
  • 3. Unify Protocol definitions in core/protocols.py

    • EmbeddingProvider duplicated in indexer.py:13-16 and retrieval.py:8-11
    • VectorStore has different definitions in indexer.py:19-34 and retrieval.py:14-19
  • 4. Fix circular import in processor.py

    • Import at bottom of file (processor.py:180) is a workaround
  • 5. Add type stubs for Streamlit (ui/types.py)

    • Currently using st: object which loses IDE support

Phase 2: Short-Term (Medium Risk)

  • 6. Extract constants to core/constants.py

    • DEFAULT_CURSOR_MARKER = "<% tp.file.cursor() %>"
    • DEFAULT_FOOTER_HEADING = "**References:**"
    • DEFAULT_SLIP_BOX_FOLDER = "05 – Slip-Box"
  • 7. Create custom error classes in core/errors.py

    • InboxProcessorError (base)
    • ConfigurationError
    • AnalysisError
    • FileOperationError
    • APIError
  • 8. Add vault tag caching in selection.py

    • Currently re-reads all vault files on each analysis
  • 9. Fix folder matching bug in selection.py:55

    • any(part in exclude for part in parts) does partial string matching
    • Should use exact matching
  • 10. Add missing tests for progress.py and embeddings.py

    • progress.py has only 1 test, no edge cases
    • embeddings.py has only 1 test
  • 11. Add edge case tests for processor.py

    • Empty content
    • Very long content
    • Malformed LLM responses

Phase 3: Medium-Term (Higher Effort)

  • 12. Split analysis_panel.py into analysis + delete modules

    • analysis_panel.py - analysis logic only
    • delete_panel.py - deletion confirmation logic
  • 13. Split selection.py into inbox + vault modules

    • inbox_selection.py - inbox item loading
    • vault_query.py - vault paths, folders, tags
  • 14. Add retry logic to llm_clients.py

    • Handle transient API failures
    • Exponential backoff
  • 15. Add rate limiting for API calls

    • Respect OpenRouter rate limits
    • Queue requests if needed
  • 16. Add integration tests

    • End-to-end workflow tests
    • API failure recovery tests
  • 17. Clean up session state management in UI

    • Document all session state keys
    • Add cleanup on navigation

Code Duplication Details

insert_at_cursor() - appears in two places

actions.py:92-99:

def insert_at_cursor(
    template_text: str, content: str, cursor_marker: str = "<% tp.file.cursor() %>"
) -> str:
    """Insert content at a cursor marker in template text."""
    content_block = _ensure_trailing_newline(content)
    if cursor_marker in template_text:
        return template_text.replace(cursor_marker, content_block)
    return template_text + content_block

formatting.py:47-54:

def insert_at_cursor(
    template_text: str, content: str, cursor_marker: str = "<% tp.file.cursor() %>"
) -> str:
    """Insert content at the cursor marker in a template or append if missing."""
    content_block = _ensure_trailing_newline(content)
    if cursor_marker in template_text:
        return template_text.replace(cursor_marker, content_block)
    return template_text + content_block

Protocol Duplication

EmbeddingProvider - identical in both files:

  • indexer.py:13-16
  • retrieval.py:8-11

VectorStore - different methods:

  • indexer.py:19-34 - has upsert(), query(), persist()
  • retrieval.py:14-19 - has only query()

Bugs Identified

File:Line Issue Severity
processor.py:35-42 Dual placeholder support ({content} and {{CONTENT}}) is confusing Low
retrieval.py:76 score = 1.0 - distance may produce negative scores Medium
selection.py:55 Partial string matching for folder exclusion Medium
formatting.py:127-130 Wiki-link regex doesn't handle aliases Low
indexer.py:147 State saved inside loop (inefficient) Low

Missing Tests

Module Current Tests Missing Scenarios
progress.py 1 Edge cases, empty state, timezone handling
embeddings.py 1 Batch boundaries, empty input, API errors
processor.py 14 Empty content, malformed responses
actions.py 10 Permission errors, concurrent writes
retrieval.py 2 Empty query, negative scores

Notes

  • All 83 existing tests pass
  • Changes should not break existing functionality
  • Each item gets its own commit for easy review/revert