Skip to content

dieter-medium/backup-my-post

Repository files navigation

backup-my-post

backup-my-post is a proof-of-concept (PoC) AI-powered workflow written in Ruby for backing up your own Medium posts in multiple structured formats.

Why?

Originally, I wanted to experiment with AI workflows and see how to implement them elegantly in Ruby. Plus, I realized I didn’t have a backup of my earliest Medium posts in a structured, future-proof way—something that would make migration or analysis much easier down the road.

Medium doesn’t make it simple to export your content with all the important details intact (think: structured JSON, summaries, images, etc.).

Yes, there’s Download your information, which gives you a ZIP of your posts as HTML files—but those are missing the images, and they’re not exactly like your original post layout.

With backup-my-post, you can:

extract your own articles, generate AI-powered summaries, store everything as PDF and rich JSON (using a detailed schema), and fetch all the images for safekeeping or migration.

It’s also a great excuse to show that bidi2pdf is more than “just” a PDF printer—it’s the backbone for grabbing, cleaning, and enhancing HTML content.


Features

  • 📝 Extract content from your own Medium post URL
  • 📄 Generate PDF with improved formatting (using Chrome and custom CSS)
  • 🤖 Moderate and summarize content with AI (OpenAI API required)
  • 🔄 Convert HTML to a rich, structured JSON schema see BackupMyPost::BlogPost.to_json_schema.to_json
  • 🖼️ Download images and link them to your structured backup
  • 🐳 Run anywhere via Docker

Workflow Overview

  1. Provide the post URL (via command line)
  2. Prepare content using bidi2pdf (needs Chrome installed)
  • Enhance CSS for better PDF rendering
  • Extract and store the main HTML
  • Convert the HTML to PDF
  • Moderate the HTML content using AI (just for demonstration purposes, it's your own post, so what could go wrong?)
  1. AI steps
  • Create a summary (with optional feedback/evaluation loop)
  • Convert HTML to JSON, following a detailed schema
  1. Store the summary and JSON export
  2. Download all images referenced in the post

Workflow Diagram


Quickstart

1. Prerequisites

  • Docker (recommended)
  • OpenAI API Key
  • Google Chrome
  • Medium post URL you own or have rights to backup

2. Get your OpenAI API Key

  • Sign up at platform.openai.com
  • For private use, you can name your organization "Personal"
  • Add some money to your OpenAI account (sorry, no free beer!)
  • Set usage limits or a budget on your account/project, just to be safe
  • If you want to experiment with advanced reasoning models, you’ll need to verify your organization
  • Create an API key and save it in ~/.config/openai/.env:
OPENAI_API_KEY=sk-...

3. Run via Docker

export IMAGE_NAME=backup-my-post
docker build -f docker/Dockerfile -t ${IMAGE_NAME} .
docker run \
  --env-file ~/.config/openai/.env \
  -v ./output:/app/posts \
  --rm -ti ${IMAGE_NAME} \
  backup-my-post https://medium.com/code-and-coffee/%EF%B8%8F-the-trials-of-parallel-ci-and-merging-ruby-coverage-reports-77a2dac84cc6

Output will be in your ./output directory.


Project Status

This project is a proof-of-concept and is evolving.
Contributions, feedback, and suggestions are very welcome!

Limitations & Learnings

Of course, the “best practice” for converting HTML to structured JSON would be to build a dedicated parser. But Medium’s underlying HTML can (and does) change over time, making custom scrapers brittle and high-maintenance. That’s where AI is surprisingly useful: it’s flexible enough to adapt to shifting layouts and “understands” content in a way a hand-coded parser usually doesn’t.

However, there’s a trade-off: The AI-generated JSON isn’t always reliable. Sometimes sections or fields are missing, and the occasional AI hallucination can sneak in. You might get a summary that misses the main point, or a section list with odd gaps.

It’s possible to improve things by adding an evaluation or feedback loop (as done with the summary generation), but even then, the results aren’t perfect. For example, even this “simple” summary task fails the evaluation step roughly 1 out of every 10 times.

Context window size matters! Large posts may exceed the model’s input limit (context window). This isn’t handled yet, but for longer posts you’d need to split the content into smaller chunks, process and evaluate each chunk, and then reassemble the results.

Cost is also a consideration. AI models (especially the latest and greatest) aren’t free—API calls add up, and some models are much more expensive than others. Be sure to pick your model wisely and keep an eye on usage and pricing, especially if you’re backing up a lot of posts.

Bottom line:

This workflow is a practical experiment, not a guaranteed bulletproof backup. It’s a fun and flexible approach that works sometimes — and showcases where AI shines and where old-school parsing still wins.


License

MIT License © 2024 Dieter S.


This project is not affiliated with Medium or OpenAI. Always respect content copyright when using this tool!

About

PoC for creating a AI workflow with ruby

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published