This repository contains a set of scripts designed to prepare and optimise HTML files exported from Adobe InDesign for single-asset web publication for present use and future archival.
This workflow is designed for preparing Adobe InDesign interative exports for single-file web publication and archival. PDFs are a good output format for static design publication and archival where the exact formating needs to be preserved. Sadly, the function of interactive PDF content is becoming harder to guarentee across both readers and time. The (X)HTML5 standard is a good complement to the PDF/A format where cross-device, future-proof interactive use is required. The HTML5 production workflow outlined here significantly reduces file size while maintaining content quality, making it suitable for sharing interactive learning content for consumpsion from the general public. This workflow mimics some of the function of ajarproduction's In5 indesign plugin.
NOTE: Only in-page interactions are supported at present. This means button-triggered popups and button-triggered audio playback. Navigation between different pages from in-page buttons is not supported.
Example input and output files are provided. A gif is below. You can view the HTML as an interactive website i.e. in the proper MIME format, via this link.
|
- Python 3.6+
- Required Python packages (install via
pip install -r requirements.txt):- BeautifulSoup4
- Pillow (PIL)
- External tools:
The optimisation process consists of four sequential steps:
- Merge multiple HTML files into a single scrollable document
- Embed all resources (CSS, images, etc.) into a single self-contained HTML file
- Optimise base64-encoded content (images and audio) to reduce file size
- Optionally, convert PNG images to JPEG for further size reduction
This workflow assumes you have created a multipage interactive design using Adobe Indesign, and have used the file -> export -> HTML5 export option. These scripts are intended to be run from publication-web-resources/html/ folder of the export, where the publication*.html files are placed. It is important that the files in the parent folders are present, since they are heavily referenced in the html files.
Note: As of v2.1, the script automatically detects publication.html (InDesign's default first page name) and treats it as page 0, so manual renaming is no longer required.
The script supports two modes of operation:
python merge_all_publications.pyThis script:
- Finds all HTML files in the current directory matching the pattern
publication-[number].html - Sorts them by page number
- Merges them into a single scrollable HTML page with navigation between sections
- Outputs the merged file as
merged-publication.html
To merge publications from multiple collections, create a collections.txt file in the same directory as the script:
# collections.txt example
collection1
collection2
collection3Each collection should have the structure: collection_name/InDesign_master/publication-web-resources/html/
Then run:
python merge_all_publications.pyThe script will automatically detect collections.txt and:
- Process each collection directory in the order listed
- Find all publication files in each collection's html directory
- Merge all pages from all collections into a single HTML file with continuous page numbering
- Output the merged file as
merged-publication.html
Note: Lines starting with # in collections.txt are treated as comments and ignored.
monolith merged-publication.html -o merged-embedded.html --no-framesThis command:
- Uses the Monolith tool to process the merged HTML file
- Embeds all external resources (CSS, JavaScript, images) directly into the HTML
- Creates a self-contained HTML file with no external dependencies
- The
--no-framesoption prevents the creation of frames
python optimise_base64_image_audio.py merged-embedded.html -i 75 -a 32 -w -vThis script:
- Processes the embedded HTML file to optimise all base64-encoded content
- Optimises images with 75% quality and converts to WebP when beneficial
- Optimises audio files with 32kbps bitrate
- Uses more efficient encoding techniques
- Provides verbose output with detailed statistics
- Outputs the optimised file as
merged-embedded-optimised.html - Generally results in a 50% reduction in filesize as a result of base64 data block zip compression and audio downsampling
- Injects JS in the output html that uncompresses the data blocks after the document loads
python png_to_jpeg_optimiser.py merged-embedded-optimised.html -j 25 -e iVBORw0KGgoAAAANSUhEUgAABG iVBORw0KGgoAAAANSUhEUgAACO iVBORw0KGgoAAAANSUhEUgAAAC iVBORw0KGgoAAAANSUhEUgAAAYThis script:
- Further optimises the HTML by converting PNG images to JPEG format
- Sets JPEG quality to 25% (adjustable via the
-jparameter) - Excludes specific PNG images from conversion using the base64 prefix (
-eparameter) - The example excludes PNGs starting with iVBORw0KGgoAAAANSUhEUgAABG and iVBORw0KGgoAAAANSUhEUgAACO. Useful to retain images with transparencies.
- Outputs the final optimised file as
merged-embedded-optimised-jpeg_converted.html
Merges multiple HTML files exported from Adobe InDesign into a single scrollable document with navigation between pages.
Features:
- Automatic File Detection: Recognizes both
publication.htmlandpublication-X.htmlformats (no more manual renaming required) - Dual Mode Operation: Automatically detects
collections.txtand switches between single directory mode and collections mode - Collections Support: Process multiple collections in a specified order, merging all pages with continuous numbering
- Smart Conflict Resolution: If both
publication.htmlandpublication-0.htmlexist, prioritizes the numbered version - Automatically finds and sorts publication files by page number
- Creates a clean, navigable interface between pages
- Preserves original content and styling
- Adds JavaScript for smooth scrolling between sections
Collections Mode Structure: Each collection directory should follow this structure:
collection_name/
└── InDesign_master/
└── publication-web-resources/
└── html/
├── publication.html (or publication-0.html)
├── publication-1.html
└── ...
Version History:
- v2.1 (11/21/24): Auto-detect
publication.htmlas page 0 (no manual renaming required) - v2.0 (11/21/24): Added support for collections.txt to merge multiple collections
- v1.1 (10/20/24): Bar colors, font-size, audio stopall, lang (EN->DE), print bg alert
Optimises all base64-encoded content in HTML files, combining image and audio optimisation in a single script.
Features:
- Optimises JPG/PNG images by reducing quality and converting to WebP when beneficial
- Handles SVG files with text-based optimisation
- Optimises audio files by reducing bitrate while maintaining compatibility
- Uses Base85 encoding for images (more efficient than Base64)
- Maintains Base64 encoding for audio files to ensure compatibility
- Adds client-side JavaScript for handling optimised content
- Provides detailed statistics for both image and audio optimisation
Options:
-i/--image-quality: Image quality (1-100, default: 80)-a/--audio-bitrate: Audio bitrate in kbps (default: 128) - requires FFmpeg-w/--webp: Convert images to WebP format when beneficial-d/--max-dimension: Maximum image dimension for resizing-m/--min-size: Minimum size to consider for optimisation-r/--min-ratio: Minimum compression ratio to apply changes-85/--base85: Use Base85 encoding for images-c/--chunks: Process file in chunks (for very large files)-v/--verbose: Print detailed output for debugging
Example Use Case:
python optimise_base64_image_audio.py merged-embedded.html -i 75 -a 32 -w -vThis command will:
- Process
merged-embedded.htmland createmerged-embedded-optimised.html - Optimise images with 75% quality and convert to WebP when beneficial
- Optimise audio files with 32kbps bitrate
- Print verbose output with detailed statistics
Requirements:
- For image optimisation: Pillow library must be installed
- For audio optimisation: FFmpeg must be installed and available in the PATH
- The HTML must contain elements with base64-encoded data URIs
- The optimised content must be at least 5% smaller than the original (configurable with
-roption)
Specifically focuses on converting PNG images to JPEG format for further size reduction.
Features:
- Identifies PNG images in base64-encoded data URIs
- Converts PNGs to JPEGs with configurable quality
- Handles transparency by replacing with white background
- Allows excluding specific PNGs from conversion
- Adds client-side JavaScript for handling optimised content
Options:
-j/--jpeg-quality: JPEG quality for PNG conversion (1-100, default: 75)-m/--min-size: Minimum size to consider for optimisation-r/--min-ratio: Minimum compression ratio to apply changes-c/--chunks: Process file in chunks (for very large files)-e/--exclude: List of base64 prefixes to exclude from conversion
