Skip to content

19-84/chronicon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Chronicon

A powerful, multi-format archiver for Discourse forums.

Python 3.11+ License: Unlicense Tests Codeberg Mirror GitGud Mirror

Live Demo β€” a static archive of Discourse Meta's Theme category (source)

If you find this useful, consider giving it a star on GitHub β€” it helps others discover the project.

Documentation | FAQ | Troubleshooting | API Reference


Features

  • Three Export Formats: Static HTML, Plain Markdown, GitHub Markdown
  • Dual Storage Backend: SQLite (default) or PostgreSQL for large-scale deployments
  • Category Filtering: Archive specific categories with watch mode support
  • Flexible Search Backends: FTS (server-rendered) or static (client-side) search
  • Incremental Updates: Only fetch new/modified content
  • Continuous Monitoring: Watch mode with automatic updates and git integration
  • Concurrent Processing: Fast archiving with parallel operations
  • Clean Default Theme: Professional, readable styling for HTML exports
  • Mobile-Friendly: Responsive design for all devices
  • Sweep Mode: Exhaustive topic fetching for complete archives

Installation

# Using uv (recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install chronicon

# Or with pip
pip install chronicon

# With PostgreSQL support (optional)
pip install chronicon[postgres]
# or
uv tool install chronicon[postgres]

Quick Start

# Archive a forum to all formats
chronicon archive --urls https://meta.discourse.org

# HTML only, no images
chronicon archive \
  --urls https://forum.example.com \
  --formats html \
  --text-only

# Update existing archive
chronicon update --output-dir ./archives

Usage

Archive Command

Archive one or more Discourse forums:

# Archive entire forum to all formats
chronicon archive --urls https://meta.discourse.org

# Archive specific categories only
chronicon archive \
  --urls https://meta.discourse.org \
  --categories 1,2,7

# Archive to HTML format only, skip images
chronicon archive \
  --urls https://meta.discourse.org \
  --formats html \
  --text-only

# Archive posts since a specific date
chronicon archive \
  --urls https://meta.discourse.org \
  --since 2024-01-01

# Archive with custom output directory
chronicon archive \
  --urls https://meta.discourse.org \
  --output-dir ./my-archives

# Archive multiple sites at once
chronicon archive \
  --urls https://meta.discourse.org,https://discuss.example.com

# Sweep mode - exhaustively fetch every topic ID
chronicon archive \
  --urls https://meta.discourse.org \
  --sweep \
  --start-id 10000

# HTML with offline search (no API server needed)
chronicon archive \
  --urls https://meta.discourse.org \
  --formats html \
  --search-backend static

Category Filtering: Use --categories 1,2,7 to archive specific categories. The filter is stored in the database and will be respected by watch mode and backfill commands.

Sweep Mode: Use --sweep to exhaustively fetch every topic ID from --start-id (defaults to max topic ID) down to --end-id (defaults to 1). This is useful for recovering deleted or unlisted topics that don't appear in normal category listings.

Search Backends: HTML exports support two search modes:

  • --search-backend fts (default): Server-rendered full-text search. Requires API server but provides better performance and relevance.
  • --search-backend static: Client-side JavaScript search that works completely offline.

πŸ’‘ Tip: Sweep mode is slower but more thorough. See PERFORMANCE.md for optimization strategies.

Update Command

Update an existing archive with new/modified content:

# Update all export formats
chronicon update --output-dir ./archives

# Update only HTML export
chronicon update \
  --output-dir ./archives \
  --formats html

# Update multiple formats
chronicon update \
  --output-dir ./archives \
  --formats html,markdown

Validate Command

Validate archive integrity:

# Validate an archive
chronicon validate --output-dir ./archives

This checks:

  • Database file exists and is readable
  • Export directories are present
  • Data integrity (no orphaned posts/topics)
  • Export file structure is correct

Migrate Command

Migrate from legacy JSON-based archives:

# Migrate from JSON to SQLite
chronicon migrate --from ./legacy_json_archive

# Migrate and export to HTML
chronicon migrate \
  --from ./legacy_json_archive \
  --format html

Watch Command

Continuously monitor and automatically update archives (see WATCH_MODE.md for full documentation):

# Start watching in foreground
chronicon watch --output-dir ./archives

# Run as background daemon
chronicon watch --output-dir ./archives --daemon

# Check daemon status
chronicon watch status --output-dir ./archives

# Stop daemon
chronicon watch stop --output-dir ./archives

Watch Mode Features:

  • Automatic polling for new/modified posts (configurable interval, default 10 minutes)
  • Git integration with auto-commit and optional auto-push
  • HTTP health check endpoints (/health, /metrics) for monitoring
  • Process management with PID files and graceful shutdown
  • Exponential backoff on errors with configurable thresholds

πŸ“š See WATCH_MODE.md for:

  • Complete watch mode documentation
  • Systemd and Docker deployment examples
  • Git integration setup
  • Troubleshooting watch mode issues

Configuration

Create .chronicon.toml in your home directory or project root:

[general]
output_dir = "./archives"
default_formats = ["html", "markdown", "github"]

# Optional: Use PostgreSQL instead of SQLite
# Requires: pip install chronicon[postgres]
# database_url = "postgresql://user:password@localhost/chronicon"

[fetching]
rate_limit_seconds = 0.5
max_workers = 8
retry_max = 5
timeout = 15

[export]
include_users = false
text_only = false

[export.html]
theme_adaptation = "simplified"
enable_search = true
responsive = true

[[sites]]
url = "https://meta.discourse.org"
nickname = "meta"
categories = [1, 2, 7]
rate_limit_seconds = 1.0

PostgreSQL Configuration: To use PostgreSQL instead of SQLite, install with pip install chronicon[postgres] and set database_url in the config. PostgreSQL is recommended for multi-million post archives or when multiple processes need concurrent access.

πŸ“š Learn more:

See .chronicon.toml.example for full configuration options.

Environment Variables

Chronicon supports the following environment variables for configuration:

Variable Description Example
DATABASE_URL PostgreSQL connection string (enables PostgreSQL mode) postgresql://user:pass@localhost/chronicon
CHRONICON_OUTPUT_DIR Override default output directory /archives
EXPORT_FORMATS Comma-separated export formats for watch mode html,markdown
GIT_TOKEN Git personal access token for HTTPS push ghp_xxxx
GIT_USERNAME Git username for HTTPS authentication myuser
GIT_REMOTE_URL Git remote URL for push operations https://github.com/user/repo.git

PostgreSQL Support

For large-scale deployments (multi-million posts) or when multiple processes need concurrent database access, use PostgreSQL instead of SQLite:

# Install with PostgreSQL support
pip install chronicon[postgres]

# Archive using PostgreSQL
DATABASE_URL="postgresql://user:pass@localhost/chronicon" \
  chronicon archive --urls https://meta.discourse.org

# Watch mode with PostgreSQL
DATABASE_URL="postgresql://user:pass@localhost/chronicon" \
  chronicon watch --output-dir ./archives

Docker Deployment

Chronicon includes production-ready Docker configurations in examples/docker/:

cd examples/docker

# SQLite-based deployment (simple)
docker compose up -d

# PostgreSQL-based deployment (recommended for production)
cp .env.postgres.example .env
# Edit .env with your POSTGRES_PASSWORD

# Start PostgreSQL first
docker compose -f docker-compose.postgres.yml up -d postgres

# Run initial archive (all categories)
docker compose -f docker-compose.postgres.yml run --rm api \
  archive --urls https://example.com --output-dir /archives

# Or archive specific categories only
docker compose -f docker-compose.postgres.yml run --rm api \
  archive --urls https://meta.discourse.org \
  --categories 7 \
  --output-dir /archives \
  --formats html

# Start full stack (API + Watch + Nginx)
docker compose -f docker-compose.postgres.yml up -d

The PostgreSQL stack includes:

  • PostgreSQL 17: Database backend with full-text search (tsvector)
  • API Server: FastAPI REST API with OpenAPI spec
  • Watch Daemon: Continuous monitoring with git integration
  • Nginx: Reverse proxy serving static archives and API

Access points after deployment:

  • http://localhost/ - Static HTML archive
  • http://localhost/api/docs - API documentation (Swagger UI)
  • http://localhost/openapi.json - OpenAPI spec for MCP clients
  • http://localhost:8080/health - Watch daemon health check

Output Structure

HTML Export

output_html/
β”œβ”€β”€ index.html                 # Homepage with category list
β”œβ”€β”€ c/
β”‚   └── category-slug/
β”‚       └── index.html         # Category page with topics
β”œβ”€β”€ t/
β”‚   └── topic-slug/
β”‚       └── 123/
β”‚           β”œβ”€β”€ index.html     # Topic page (first page)
β”‚           └── page-2.html    # Paginated pages
β”œβ”€β”€ u/
β”‚   └── username/
β”‚       └── index.html         # User profile page
β”œβ”€β”€ assets/
β”‚   β”œβ”€β”€ css/                   # Default theme CSS
β”‚   β”œβ”€β”€ js/                    # Search functionality
β”‚   β”œβ”€β”€ images/                # Downloaded post images
β”‚   β”œβ”€β”€ emoji/                 # Shared emoji assets
β”‚   └── site/                  # Favicon, logo, banner
└── search_index.json          # Client-side search index

Markdown Export

output_markdown/
β”œβ”€β”€ topics/
β”‚   └── YYYY-MM-Month/
β”‚       └── YYYY-MM-DD-topic-slug-123.md
└── .metadata.json

GitHub Markdown Export

output_github/
β”œβ”€β”€ README.md                  # TOC with all topics
β”œβ”€β”€ topics/
β”‚   └── YYYY-MM-Month/
β”‚       └── topic-slug-123.md
└── assets/
    └── images/
        └── 123/               # Images organized by topic ID

Troubleshooting

πŸ“š See TROUBLESHOOTING.md for comprehensive troubleshooting guide

Rate Limiting

If you encounter rate limiting errors (HTTP 429):

# Increase rate limit delay
chronicon archive \
  --urls https://example.com \
  --rate-limit 1.0  # Wait 1 second between requests

Large Forums

For forums with 100k+ posts:

# Use category filtering
chronicon archive \
  --urls https://example.com \
  --categories 1,2,3  # Archive specific categories

# Use date filtering
chronicon archive \
  --urls https://example.com \
  --since 2023-01-01  # Archive recent content only

Network Errors

The archiver includes automatic retry logic with exponential backoff. If you still encounter issues:

  • Check your internet connection
  • Verify the forum URL is correct
  • Try increasing the timeout: edit .chronicon.toml and set timeout = 30
  • Some forums may block automated access - check the forum's terms of service

Database Corruption

If you suspect database corruption:

# Validate the archive
chronicon validate --output-dir ./archives

# If validation fails, you may need to re-archive
# Backup your database first!
cp archives/archive.db archives/archive.db.backup

# Then re-run archive command

Frequently Asked Questions

πŸ“š See FAQ.md for complete FAQ with 50+ questions

Q: How much disk space do I need? A: Depends on forum size. Budget ~2-3x the size of all images. A typical forum with 10k topics and 100k posts might use 1-5GB with images, or 50-200MB text-only.

Q: How long does archiving take? A: Depends on forum size and rate limiting. With default settings (0.5s between requests):

  • Small forum (1k topics): 10-30 minutes
  • Medium forum (10k topics): 2-5 hours
  • Large forum (100k+ topics): 10+ hours

Use --categories to archive incrementally.

Q: Can I archive private/authenticated forums? A: Currently, the tool only supports public forums. Authentication support is planned for a future release.

Q: What about incremental backups? A: Use the update command to incrementally fetch new content. It only downloads posts modified since the last archive.

Q: Can I customize the HTML theme? A: Yes! The HTML export uses a clean default theme that you can modify. Edit the CSS files in your archive's assets/css/ directory to customize colors, fonts, and layout.

Q: How do I search the archived content?
A: HTML exports support two search backends:

  • FTS (default): Server-rendered full-text search using SQLite FTS5 or PostgreSQL tsvector. Requires running the API server.
  • Static: Client-side JavaScript search that works completely offline. Use --search-backend static during export.

For markdown exports, use your editor's search or grep.

Screenshots

HTML Export

Static HTML archive with clean, professional design and full offline functionality:

HTML Export - Homepage Homepage with topic listing and navigation

HTML Export - Topic Page Individual topic with rich formatting and images

HTML Export - Search Client-side search functionality (works offline)

HTML Export - Mobile Mobile-responsive design

Markdown Exports

Clean, organized markdown files perfect for offline reading and GitHub hosting:

markdown/
β”œβ”€β”€ index.md                    # Main index page
β”œβ”€β”€ latest/                     # Latest topics
β”‚   └── index.md
β”œβ”€β”€ top/                        # Top topics by replies/views
β”‚   β”œβ”€β”€ replies/
β”‚   └── views/
β”œβ”€β”€ topics/                     # Topics organized by date
β”‚   β”œβ”€β”€ 2026-01-January/
β”‚   β”‚   β”œβ”€β”€ 2026-01-21-activation-email-sent-*.md
β”‚   β”‚   └── 2026-01-23-adding-a-static-web-page-*.md
β”‚   └── ...
└── categories/                 # Category indexes

See the live demo archive for a complete sample archive from meta.discourse.org, or browse the source repository.

Development

πŸ“š For comprehensive development guide, see DEVELOPMENT.md

πŸ“š For API documentation, see API_REFERENCE.md

# Clone repository
git clone https://github.com/19-84/chronicon.git
cd chronicon

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=chronicon --cov-report=html

Project Structure

πŸ“š See also:

Running Tests

The project has 520+ passing tests with 80% coverage:

  • CLI command tests (archive, update, validate, migrate, watch)
  • Fetcher integration tests
  • Concurrent processing tests
  • Model serialization and edge case tests
  • Export format tests (HTML, Markdown, GitHub)
  • Watch mode and git integration tests
  • Health monitoring tests
  • Real-world integration tests

Status

Current Version: v1.0.0 Production/Stable

All features complete and battle-tested:

  • βœ… Foundation (database, models, API client)
  • βœ… Asset & theme management
  • βœ… Static HTML export with search
  • βœ… Markdown exporters (plain & GitHub)
  • βœ… Incremental updates
  • βœ… CLI interface with rich output
  • βœ… Continuous monitoring (watch mode)
  • βœ… PostgreSQL support
  • βœ… Comprehensive testing (350+ tests, 80% coverage)

License

This software is released into the public domain under The Unlicense. See the LICENSE file for details.

Contributing

See CONTRIBUTING.md for guidelines.

πŸ“§ Contact

πŸ’° Support the Project

Chronicon was built by one person as a labor of love to preserve internet history before it disappears forever.

This isn't backed by a company or institutionβ€”just an individual committed to keeping valuable discussions accessible. Your support helps:

  • Continue development and bug fixes
  • Maintain documentation and support
  • Cover infrastructure costs (servers, storage, bandwidth)
  • Preserve more data sources and platforms

Every donation, no matter the size, helps keep this preservation effort alive.

Bitcoin (BTC)

bc1q8wpdldnfqt3n9jh2n9qqmhg9awx20hxtz6qdl7

Bitcoin QR Code
Scan to donate Bitcoin

Monero (XMR)

42zJZJCqxyW8xhhWngXHjhYftaTXhPdXd9iJ2cMp9kiGGhKPmtHV746EknriN4TNqYR2e8hoaDwrMLfv7h1wXzizMzhkeQi

Monero QR Code
Scan to donate Monero

Thank you for supporting internet archival efforts! Every contribution helps maintain and improve this project.


Star History

Star History Chart


This software is provided "as is" under the Unlicense. See LICENSE for details. Users are responsible for compliance with applicable laws and terms of service when processing data.

About

Chronicon is a multi-format Discourse forum archiver. It exports forums to static HTML, plain Markdown, and GitHub Markdown, using SQLite or PostgreSQL for storage. Supports incremental updates, continuous watch mode.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors