Skip to content

This repository contains a set of scripts designed to prepare and optimise HTML files exported from Adobe InDesign for single-asset web publication.

License

Notifications You must be signed in to change notification settings

Nordln/indesign-html-merge

Repository files navigation

Indesign HTML5 Merging and Optimisation Workflow

This repository contains a set of scripts designed to prepare and optimise HTML files exported from Adobe InDesign for single-asset web publication for present use and future archival.

Use Case

This workflow is designed for preparing Adobe InDesign interative exports for single-file web publication and archival. PDFs are a good output format for static design publication and archival where the exact formating needs to be preserved. Sadly, the function of interactive PDF content is becoming harder to guarentee across both readers and time. The (X)HTML5 standard is a good complement to the PDF/A format where cross-device, future-proof interactive use is required. The HTML5 production workflow outlined here significantly reduces file size while maintaining content quality, making it suitable for sharing interactive learning content for consumpsion from the general public. This workflow mimics some of the function of ajarproduction's In5 indesign plugin.

NOTE: Only in-page interactions are supported at present. This means button-triggered popups and button-triggered audio playback. Navigation between different pages from in-page buttons is not supported.

Example input and output files are provided. A gif is below. You can view the HTML as an interactive website i.e. in the proper MIME format, via this link.

Prerequisites

  • Python 3.6+
  • Required Python packages (install via pip install -r requirements.txt):
    • BeautifulSoup4
    • Pillow (PIL)
  • External tools:
    • Monolith v2.10.1 or higher - A command-line tool for saving web pages as a single HTML file
    • FFmpeg v5.1.6 or higher - A command-line tool for converting media files into different formats and bitrates

Workflow Overview

The optimisation process consists of four sequential steps:

  1. Merge multiple HTML files into a single scrollable document
  2. Embed all resources (CSS, images, etc.) into a single self-contained HTML file
  3. Optimise base64-encoded content (images and audio) to reduce file size
  4. Optionally, convert PNG images to JPEG for further size reduction

Step-by-Step Process

This workflow assumes you have created a multipage interactive design using Adobe Indesign, and have used the file -> export -> HTML5 export option. These scripts are intended to be run from publication-web-resources/html/ folder of the export, where the publication*.html files are placed. It is important that the files in the parent folders are present, since they are heavily referenced in the html files.

Note: As of v2.1, the script automatically detects publication.html (InDesign's default first page name) and treats it as page 0, so manual renaming is no longer required.

1. Merge Publications

The script supports two modes of operation:

Original Mode (Single Directory)

python merge_all_publications.py

This script:

  • Finds all HTML files in the current directory matching the pattern publication-[number].html
  • Sorts them by page number
  • Merges them into a single scrollable HTML page with navigation between sections
  • Outputs the merged file as merged-publication.html

Collections Mode (Multiple Directories)

To merge publications from multiple collections, create a collections.txt file in the same directory as the script:

# collections.txt example
collection1
collection2
collection3

Each collection should have the structure: collection_name/InDesign_master/publication-web-resources/html/

Then run:

python merge_all_publications.py

The script will automatically detect collections.txt and:

  • Process each collection directory in the order listed
  • Find all publication files in each collection's html directory
  • Merge all pages from all collections into a single HTML file with continuous page numbering
  • Output the merged file as merged-publication.html

Note: Lines starting with # in collections.txt are treated as comments and ignored.

2. Embed Resources with Monolith

monolith merged-publication.html -o merged-embedded.html --no-frames

This command:

  • Uses the Monolith tool to process the merged HTML file
  • Embeds all external resources (CSS, JavaScript, images) directly into the HTML
  • Creates a self-contained HTML file with no external dependencies
  • The --no-frames option prevents the creation of frames

3. Optimise Base64 Content (Images and Audio)

python optimise_base64_image_audio.py merged-embedded.html -i 75 -a 32 -w -v

This script:

  • Processes the embedded HTML file to optimise all base64-encoded content
  • Optimises images with 75% quality and converts to WebP when beneficial
  • Optimises audio files with 32kbps bitrate
  • Uses more efficient encoding techniques
  • Provides verbose output with detailed statistics
  • Outputs the optimised file as merged-embedded-optimised.html
  • Generally results in a 50% reduction in filesize as a result of base64 data block zip compression and audio downsampling
  • Injects JS in the output html that uncompresses the data blocks after the document loads

4. [Optional] Convert PNGs to JPEGs

python png_to_jpeg_optimiser.py merged-embedded-optimised.html -j 25 -e iVBORw0KGgoAAAANSUhEUgAABG iVBORw0KGgoAAAANSUhEUgAACO iVBORw0KGgoAAAANSUhEUgAAAC iVBORw0KGgoAAAANSUhEUgAAAY

This script:

  • Further optimises the HTML by converting PNG images to JPEG format
  • Sets JPEG quality to 25% (adjustable via the -j parameter)
  • Excludes specific PNG images from conversion using the base64 prefix (-e parameter)
  • The example excludes PNGs starting with iVBORw0KGgoAAAANSUhEUgAABG and iVBORw0KGgoAAAANSUhEUgAACO. Useful to retain images with transparencies.
  • Outputs the final optimised file as merged-embedded-optimised-jpeg_converted.html

Script Details

merge_all_publications.py

Merges multiple HTML files exported from Adobe InDesign into a single scrollable document with navigation between pages.

Features:

  • Automatic File Detection: Recognizes both publication.html and publication-X.html formats (no more manual renaming required)
  • Dual Mode Operation: Automatically detects collections.txt and switches between single directory mode and collections mode
  • Collections Support: Process multiple collections in a specified order, merging all pages with continuous numbering
  • Smart Conflict Resolution: If both publication.html and publication-0.html exist, prioritizes the numbered version
  • Automatically finds and sorts publication files by page number
  • Creates a clean, navigable interface between pages
  • Preserves original content and styling
  • Adds JavaScript for smooth scrolling between sections

Collections Mode Structure: Each collection directory should follow this structure:

collection_name/
└── InDesign_master/
    └── publication-web-resources/
        └── html/
            ├── publication.html (or publication-0.html)
            ├── publication-1.html
            └── ...

Version History:

  • v2.1 (11/21/24): Auto-detect publication.html as page 0 (no manual renaming required)
  • v2.0 (11/21/24): Added support for collections.txt to merge multiple collections
  • v1.1 (10/20/24): Bar colors, font-size, audio stopall, lang (EN->DE), print bg alert

optimise_base64_image_audio.py

Optimises all base64-encoded content in HTML files, combining image and audio optimisation in a single script.

Features:

  • Optimises JPG/PNG images by reducing quality and converting to WebP when beneficial
  • Handles SVG files with text-based optimisation
  • Optimises audio files by reducing bitrate while maintaining compatibility
  • Uses Base85 encoding for images (more efficient than Base64)
  • Maintains Base64 encoding for audio files to ensure compatibility
  • Adds client-side JavaScript for handling optimised content
  • Provides detailed statistics for both image and audio optimisation

Options:

  • -i/--image-quality: Image quality (1-100, default: 80)
  • -a/--audio-bitrate: Audio bitrate in kbps (default: 128) - requires FFmpeg
  • -w/--webp: Convert images to WebP format when beneficial
  • -d/--max-dimension: Maximum image dimension for resizing
  • -m/--min-size: Minimum size to consider for optimisation
  • -r/--min-ratio: Minimum compression ratio to apply changes
  • -85/--base85: Use Base85 encoding for images
  • -c/--chunks: Process file in chunks (for very large files)
  • -v/--verbose: Print detailed output for debugging

Example Use Case:

python optimise_base64_image_audio.py merged-embedded.html -i 75 -a 32 -w -v

This command will:

  • Process merged-embedded.html and create merged-embedded-optimised.html
  • Optimise images with 75% quality and convert to WebP when beneficial
  • Optimise audio files with 32kbps bitrate
  • Print verbose output with detailed statistics

Requirements:

  1. For image optimisation: Pillow library must be installed
  2. For audio optimisation: FFmpeg must be installed and available in the PATH
  3. The HTML must contain elements with base64-encoded data URIs
  4. The optimised content must be at least 5% smaller than the original (configurable with -r option)

png_to_jpeg_optimiser.py

Specifically focuses on converting PNG images to JPEG format for further size reduction.

Features:

  • Identifies PNG images in base64-encoded data URIs
  • Converts PNGs to JPEGs with configurable quality
  • Handles transparency by replacing with white background
  • Allows excluding specific PNGs from conversion
  • Adds client-side JavaScript for handling optimised content

Options:

  • -j/--jpeg-quality: JPEG quality for PNG conversion (1-100, default: 75)
  • -m/--min-size: Minimum size to consider for optimisation
  • -r/--min-ratio: Minimum compression ratio to apply changes
  • -c/--chunks: Process file in chunks (for very large files)
  • -e/--exclude: List of base64 prefixes to exclude from conversion

About

This repository contains a set of scripts designed to prepare and optimise HTML files exported from Adobe InDesign for single-asset web publication.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors