Skip to content

domingogallardo/docflow

Repository files navigation

docflow (Local Intranet)

docflow automates your personal document pipeline (Instapaper posts, podcasts, Markdown notes, PDFs, images, and tweets) and serves everything locally from BASE_DIR.

Podcast snippets are typically captured in Snipd and then exported into this pipeline.

Architecture overview

docflow architecture

What this repo does now

  • Single local source of truth: BASE_DIR (resolved from DOCFLOW_BASE_DIR, typically in ~/.docflow_env).
  • Single local server: python utils/docflow_server.py.
  • Static site output under BASE_DIR/_site.
  • Local state under BASE_DIR/state.
  • No remote deploy flow in this repository.

BASE_DIR Location (Important)

  • BASE_DIR is no longer hardcoded in config.py.
  • BASE_DIR now comes from environment variable DOCFLOW_BASE_DIR.
  • Canonical place to set it: ~/.docflow_env.
  • If DOCFLOW_BASE_DIR is missing, importing config.py fails with a clear error.
  • For direct commands from this repo, load your environment first:
source ~/.docflow_env

Recommended ~/.docflow_env snippet:

export DOCFLOW_BASE_DIR="/path/to/BASE_DIR"
export INTRANET_BASE_DIR="$DOCFLOW_BASE_DIR"
export HIGHLIGHTS_DAILY_DIR="/path/to/Obsidian/Subrayados"
export DONE_LINKS_FILE="/path/to/Obsidian/Leidos.md"

Main folders

BASE_DIR is expected to contain:

  • Incoming/
  • Posts/Posts <YEAR>/
  • Tweets/Tweets <YEAR>/
  • Podcasts/Podcasts <YEAR>/
  • Pdfs/Pdfs <YEAR>/
  • Images/Images <YEAR>/
  • _site/ (generated)
  • state/ (generated)

Requirements

  • Python 3.10+
  • Core dependencies:
pip install requests beautifulsoup4 markdownify openai pillow pytest markdown

Optional for X likes queue:

pip install "playwright>=1.55"
playwright install chromium

Quick start

  1. Configure environment variables (as needed):
export OPENAI_API_KEY=...
export INSTAPAPER_USERNAME=...
export INSTAPAPER_PASSWORD=...
export DOCFLOW_BASE_DIR="/path/to/BASE_DIR"
export TWEET_LIKES_STATE="$HOME/.secrets/docflow/x_state.json"
export TWEET_LIKES_URL=https://x.com/<user>/likes
export TWEET_LIKES_MAX=50
export HIGHLIGHTS_DAILY_DIR="/path/to/Obsidian/Subrayados"
export DONE_LINKS_FILE="/path/to/Obsidian/Leidos.md"

Keep TWEET_LIKES_STATE outside the repo so cleanup operations do not delete it.

  1. Run the processing pipeline:
python process_documents.py all --year 2026
  1. Build local intranet pages:
python utils/build_browse_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_reading_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_working_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_done_index.py --base-dir "$DOCFLOW_BASE_DIR"
  1. Run local server:
source ~/.docflow_env
python utils/docflow_server.py --base-dir "$DOCFLOW_BASE_DIR" --host localhost --port 8080

Optional full rebuild at startup:

source ~/.docflow_env
python utils/docflow_server.py --base-dir "$DOCFLOW_BASE_DIR" --rebuild-on-start

Preferred day-to-day usage is the LaunchAgent-managed intranet service:

launchctl kickstart -k "gui/$(id -u)/com.domingo.docflow.intranet"

Useful service commands:

launchctl print "gui/$(id -u)/com.domingo.docflow.intranet" | rg 'state =|pid =|last exit code ='
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/

If the agent is not loaded yet in a new environment:

launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/com.domingo.docflow.intranet.plist

Full document ingestion runner (bin/docflow.sh)

Use this command to download/process all document types:

bash bin/docflow.sh all

Behavior:

  • Loads ~/.docflow_env if present.
  • Runs process_documents.py with your arguments (all for full ingestion).
  • Rebuilds intranet browse/reading/working/done pages (utils/build_browse_index.py, utils/build_reading_index.py, utils/build_working_index.py, and utils/build_done_index.py) when processing succeeds.

Optional override:

INTRANET_BASE_DIR="/path/to/base" bash bin/docflow.sh all

Daily tweet consolidation runner (bin/docflow_tweet_daily.sh)

Use this dedicated process for daily tweet consolidation:

bash bin/docflow_tweet_daily.sh

Behavior:

  • Loads ~/.docflow_env if present.
  • Runs bin/build_tweet_consolidated.sh --yesterday.
  • Rebuilds intranet browse/reading/working/done pages when consolidation succeeds.

Intranet server API

utils/docflow_server.py serves:

  • Static files from BASE_DIR/_site
  • Raw files from BASE_DIR routes (/posts/raw/..., /tweets/raw/..., etc.)
  • browse list default ordering: by file recency (items in Reading/Working/Done are hidden from browse)
  • browse pages include a top Highlights first toggle to prioritize highlighted items
  • reading list ordering: by reading_at (oldest first)
  • working list ordering: by working_at (newest first)
  • done list ordering: by done_at (newest first)
  • to-done can be triggered from Browse, Reading, or Working, and preserves stage start metadata in state/done.json when available (reading_started_at, working_started_at)
  • JSON API actions:
    • POST /api/to-reading
    • POST /api/to-working
    • POST /api/to-done
    • POST /api/to-browse
    • POST /api/reopen
    • POST /api/delete
    • POST /api/rebuild
    • POST /api/rebuild-file
    • GET /api/export-pdf?path=<rel_path>
    • GET /api/highlights?path=<rel_path>
    • PUT /api/highlights?path=<rel_path>

If DONE_LINKS_FILE is set, each POST /api/to-done transition appends a Markdown link entry to that file.

Local state files

All state is stored under BASE_DIR/state/:

  • reading.json: per-path reading_at timestamp.
  • working.json: per-path working_at timestamp.
  • done.json: per-path done_at timestamp and optional transition metadata copied on to-done:
    • reading_started_at (from reading_at when moving from Reading to Done)
    • working_started_at (from working_at when moving from Working to Done)

These fields allow post-hoc lead-time calculations for completed items (for example done_at - working_started_at).

Tweet pipeline

  • Queue from likes feed:
python process_documents.py tweets
  • One-time browser state creation:
python utils/create_x_state.py --state-path "$HOME/.secrets/docflow/x_state.json"
  • Daily consolidated tweets helper:
bash bin/build_tweet_consolidated.sh
bash bin/build_tweet_consolidated.sh --day 2026-02-13
bash bin/build_tweet_consolidated.sh --all-days
bash bin/build_tweet_consolidated.sh --all-days --cleanup-existing

By default, daily grouping for tweet source files uses a local rollover hour at 03:00 to include just-after-midnight downloads in the previous day. Override with DOCFLOW_TWEET_DAY_ROLLOVER_HOUR (0-23) when needed.

--cleanup-existing removes only source tweet .html files for consolidated days and keeps source .md.

  • Daily highlights report helper:
python utils/build_daily_highlights_report.py --day 2026-02-13 --output "/tmp/highlights-2026-02-13.md"
  • Daily highlights report runner (previous day to Obsidian Subrayados):
bash bin/docflow_highlights_daily.sh

Clipboard Markdown helper (bin/mdclip)

Use this helper to clean rich content from the macOS clipboard and convert it to compact Markdown (optimized for Obsidian paste behavior):

bin/mdclip

Behavior:

  • Reads HTML from clipboard when available (pbpaste -Prefer html, then macOS pasteboard fallbacks).
  • Converts to Markdown and removes extra blank lines between list items.
  • Writes cleaned Markdown back to clipboard by default.

Useful flags:

bin/mdclip --print
bin/mdclip --no-copy
bin/mdclip --from-stdin --no-copy --print < /path/to/input.html

Keyboard shortcut bindings (for example cmd+shift+L) are configured outside this repo (Shortcuts/automation tool). The versioned command to invoke is bin/mdclip.

Tests

Run all tests:

pytest -v

Targeted example:

pytest tests/test_docflow_server.py -q

Optional remote access

You can expose the local intranet through a private VPN (for example, Tailscale).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors