docflow automates your personal document pipeline (Instapaper posts, podcasts, Markdown notes, PDFs, images, and tweets) and serves everything locally from BASE_DIR.
Podcast snippets are typically captured in Snipd and then exported into this pipeline.
- Single local source of truth:
BASE_DIR(resolved fromDOCFLOW_BASE_DIR, typically in~/.docflow_env). - Single local server:
python utils/docflow_server.py. - Static site output under
BASE_DIR/_site. - Local state under
BASE_DIR/state. - No remote deploy flow in this repository.
BASE_DIRis no longer hardcoded inconfig.py.BASE_DIRnow comes from environment variableDOCFLOW_BASE_DIR.- Canonical place to set it:
~/.docflow_env. - If
DOCFLOW_BASE_DIRis missing, importingconfig.pyfails with a clear error. - For direct commands from this repo, load your environment first:
source ~/.docflow_envRecommended ~/.docflow_env snippet:
export DOCFLOW_BASE_DIR="/path/to/BASE_DIR"
export INTRANET_BASE_DIR="$DOCFLOW_BASE_DIR"
export HIGHLIGHTS_DAILY_DIR="/path/to/Obsidian/Subrayados"
export DONE_LINKS_FILE="/path/to/Obsidian/Leidos.md"BASE_DIR is expected to contain:
Incoming/Posts/Posts <YEAR>/Tweets/Tweets <YEAR>/Podcasts/Podcasts <YEAR>/Pdfs/Pdfs <YEAR>/Images/Images <YEAR>/_site/(generated)state/(generated)
- Python 3.10+
- Core dependencies:
pip install requests beautifulsoup4 markdownify openai pillow pytest markdownOptional for X likes queue:
pip install "playwright>=1.55"
playwright install chromium- Configure environment variables (as needed):
export OPENAI_API_KEY=...
export INSTAPAPER_USERNAME=...
export INSTAPAPER_PASSWORD=...
export DOCFLOW_BASE_DIR="/path/to/BASE_DIR"
export TWEET_LIKES_STATE="$HOME/.secrets/docflow/x_state.json"
export TWEET_LIKES_URL=https://x.com/<user>/likes
export TWEET_LIKES_MAX=50
export HIGHLIGHTS_DAILY_DIR="/path/to/Obsidian/Subrayados"
export DONE_LINKS_FILE="/path/to/Obsidian/Leidos.md"Keep TWEET_LIKES_STATE outside the repo so cleanup operations do not delete it.
- Run the processing pipeline:
python process_documents.py all --year 2026- Build local intranet pages:
python utils/build_browse_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_reading_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_working_index.py --base-dir "$DOCFLOW_BASE_DIR"
python utils/build_done_index.py --base-dir "$DOCFLOW_BASE_DIR"- Run local server:
source ~/.docflow_env
python utils/docflow_server.py --base-dir "$DOCFLOW_BASE_DIR" --host localhost --port 8080Optional full rebuild at startup:
source ~/.docflow_env
python utils/docflow_server.py --base-dir "$DOCFLOW_BASE_DIR" --rebuild-on-startPreferred day-to-day usage is the LaunchAgent-managed intranet service:
launchctl kickstart -k "gui/$(id -u)/com.domingo.docflow.intranet"Useful service commands:
launchctl print "gui/$(id -u)/com.domingo.docflow.intranet" | rg 'state =|pid =|last exit code ='
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/If the agent is not loaded yet in a new environment:
launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/com.domingo.docflow.intranet.plistUse this command to download/process all document types:
bash bin/docflow.sh allBehavior:
- Loads
~/.docflow_envif present. - Runs
process_documents.pywith your arguments (allfor full ingestion). - Rebuilds intranet browse/reading/working/done pages (
utils/build_browse_index.py,utils/build_reading_index.py,utils/build_working_index.py, andutils/build_done_index.py) when processing succeeds.
Optional override:
INTRANET_BASE_DIR="/path/to/base" bash bin/docflow.sh allUse this dedicated process for daily tweet consolidation:
bash bin/docflow_tweet_daily.shBehavior:
- Loads
~/.docflow_envif present. - Runs
bin/build_tweet_consolidated.sh --yesterday. - Rebuilds intranet browse/reading/working/done pages when consolidation succeeds.
utils/docflow_server.py serves:
- Static files from
BASE_DIR/_site - Raw files from
BASE_DIRroutes (/posts/raw/...,/tweets/raw/..., etc.) browselist default ordering: by file recency (items in Reading/Working/Done are hidden from browse)browsepages include a topHighlights firsttoggle to prioritize highlighted itemsreadinglist ordering: byreading_at(oldest first)workinglist ordering: byworking_at(newest first)donelist ordering: bydone_at(newest first)to-donecan be triggered from Browse, Reading, or Working, and preserves stage start metadata instate/done.jsonwhen available (reading_started_at,working_started_at)- JSON API actions:
POST /api/to-readingPOST /api/to-workingPOST /api/to-donePOST /api/to-browsePOST /api/reopenPOST /api/deletePOST /api/rebuildPOST /api/rebuild-fileGET /api/export-pdf?path=<rel_path>GET /api/highlights?path=<rel_path>PUT /api/highlights?path=<rel_path>
If DONE_LINKS_FILE is set, each POST /api/to-done transition appends a Markdown link entry to that file.
All state is stored under BASE_DIR/state/:
reading.json: per-pathreading_attimestamp.working.json: per-pathworking_attimestamp.done.json: per-pathdone_attimestamp and optional transition metadata copied onto-done:reading_started_at(fromreading_atwhen moving from Reading to Done)working_started_at(fromworking_atwhen moving from Working to Done)
These fields allow post-hoc lead-time calculations for completed items (for example done_at - working_started_at).
- Queue from likes feed:
python process_documents.py tweets- One-time browser state creation:
python utils/create_x_state.py --state-path "$HOME/.secrets/docflow/x_state.json"- Daily consolidated tweets helper:
bash bin/build_tweet_consolidated.sh
bash bin/build_tweet_consolidated.sh --day 2026-02-13
bash bin/build_tweet_consolidated.sh --all-days
bash bin/build_tweet_consolidated.sh --all-days --cleanup-existingBy default, daily grouping for tweet source files uses a local rollover hour at 03:00
to include just-after-midnight downloads in the previous day. Override with
DOCFLOW_TWEET_DAY_ROLLOVER_HOUR (0-23) when needed.
--cleanup-existing removes only source tweet .html files for consolidated days and keeps source .md.
- Daily highlights report helper:
python utils/build_daily_highlights_report.py --day 2026-02-13 --output "/tmp/highlights-2026-02-13.md"- Daily highlights report runner (previous day to Obsidian
Subrayados):
bash bin/docflow_highlights_daily.shUse this helper to clean rich content from the macOS clipboard and convert it to compact Markdown (optimized for Obsidian paste behavior):
bin/mdclipBehavior:
- Reads HTML from clipboard when available (
pbpaste -Prefer html, then macOS pasteboard fallbacks). - Converts to Markdown and removes extra blank lines between list items.
- Writes cleaned Markdown back to clipboard by default.
Useful flags:
bin/mdclip --print
bin/mdclip --no-copy
bin/mdclip --from-stdin --no-copy --print < /path/to/input.htmlKeyboard shortcut bindings (for example cmd+shift+L) are configured outside this repo (Shortcuts/automation tool). The versioned command to invoke is bin/mdclip.
Run all tests:
pytest -vTargeted example:
pytest tests/test_docflow_server.py -qYou can expose the local intranet through a private VPN (for example, Tailscale).
