Skip to content

Conversation

@ascheman
Copy link
Member

Summary

This PR adds a first‑class, Docker‑based GitHub Action for HTML Sanity Check (HSC) and a reusable container image published to GHCR. It also introduces CI jobs to build, test, and publish multi‑arch Docker images and a scheduled workflow that cleans up old, timestamped image versions from GHCR.

What’s included

  • New GitHub Action backed by Docker image
    • action.yml runs using docker://ghcr.io/aim42/hsc:v2 with entrypoint '/hsc.sh'.
    • Single input args to pass through to the CLI (HscCommand).
  • Reusable Docker image for independent, container‑based execution of HSC
    • Dockerfile based on eclipse-temurin:21-jre-alpine.
    • Copies shaded CLI JAR and lightweight hsc.sh entrypoint.
    • Multi‑arch builds (linux/amd64, linux/arm64).
  • CI workflow for build, test, and publish
    • .github/workflows/gradle-build.yml now:
      • Builds artifacts and runs existing tests.
      • Pulls the branch‑tagged image, re‑tags as v2, and executes the new Action end‑to‑end as part of CI.
      • Publishes Docker images to GHCR via Gradle’s Docker plugin, with optional extra tags via workflow_dispatch input additional_tags.
  • GHCR cleanup workflow
    • .github/workflows/cleanup-ghcr.yml removes old timestamped or stale SHA‑only image versions (default retention 14 days), while protecting latest and v* tags.

Implementation Details

  • action.yml
    • runs.using: 'docker' with fixed image ghcr.io/aim42/hsc:v2 and entrypoint: '/hsc.sh'.
    • Note: If the action’s image tag changes (e.g., to v3), the test in gradle-build.yml must be updated accordingly (inline comment present).
  • htmlSanityCheck-cli/Dockerfile
    • Labels include org.opencontainers.image.description and version (ARG‑driven) for provenance.
    • Uses hsc.sh to exec java -jar /hsc.jar with passed arguments.
  • htmlSanityCheck-cli/build.gradle
    • Uses com.fussionlabs.gradle.docker-plugin for Buildx.
    • Tagging strategy (dockerTags):
      • Always: timestamp yyyyMMddHHmmss and sanitized branch name.
      • On main: also push v<major> (for the Action) and latest.
      • Supports additional tags via -Ddocker.image.additional.tags.
    • Multi‑arch push wired via dockerBuildMulti; dockerPush depends on it.
  • .github/workflows/gradle-build.yml
    • test-gh-action pulls the branch image, tags it locally as ghcr.io/aim42/hsc:v2, and runs the Action from this repo with sample args to validate end‑to‑end behavior.
    • publish-docker-images uses the Gradle task dockerPush with Buildx multi‑arch and GitHub Packages auth via GITHUB_TOKEN.
  • .github/workflows/cleanup-ghcr.yml
    • Nightly (02:00 UTC) or manual.
    • Deletes versions older than retention days when:
      • Any timestamp tag (yyyyMMddHHmmss) is older than cutoff, or
      • Only SHA‑like tags exist and version created_at/updated_at is older than cutoff, or
      • No tags at all and older than cutoff.
    • Skips versions that contain latest or v\d+ tags.
    • Defaults to dry‑run on manual dispatch; can be overridden.

Usage

  • In GitHub Actions (recommended)

    - name: HTML Sanity Check
      uses: aim42/htmlSanityCheck@<ref>
      with:
        args: >-
          -r build/gh-action-test-report integration-test/common/build/docs \
          --exclude 'https://www\.baeldung\.com/.*' \
          --fail-on-errors
    • Replace <ref> with a tag, branch, or commit SHA of this repository.
    • The Action will run the Docker image ghcr.io/aim42/hsc:v2 under the hood.
  • As a standalone Docker image

    docker run --rm \
      -v "$PWD:/work" \
      -w /work \
      ghcr.io/aim42/hsc:v2 \
      -r build/hsc-report path/to/site \
      --fail-on-errors

Why this change

  • Provides a simple, consistent way to run HSC in CI without managing JDKs or local installs.
  • Enables reproducible, containerized execution for local use and other CI systems.
  • Keeps GHCR tidy by removing ephemeral, timestamped images after a retention period, saving storage and reducing clutter.

Backward compatibility

  • No breaking changes to the core CLI.
  • GitHub Action is new; consumers can adopt it incrementally.

Security and Permissions

  • Docker images are published to GHCR using GITHUB_TOKEN with packages: write.
  • Cleanup workflow only runs on schedule for main (or via manual dispatch) and skips protected tags (latest, v*).

Testing

  • CI job test-gh-action validates the Action end‑to‑end using a locally retagged image matching v2.
  • Artifacts and a simple report are uploaded for inspection.

Follow‑ups

  • When releasing a new major, bump the Action image tag (e.g., v3) in action.yml and the test workflow.
  • Optionally document the full CLI options in the project docs and link them from action.yml.

Issues

to prepare Docker build and GH action as it contains HSC and all
dependencies.
The primary goal is to provide a GitHub action.
Additionally, we create and publish a multi-platform Docker image for
usage in other scenarios (Standalone, GitLab CI, ...).
Execute the Docker integration test only if Docker is available.
Will only be pushed on successful build and test by GitHub workflow.
Switching off Docker push leads to
ERROR: failed to build: docker exporter does not currently support exporting manifest lists
The integration test must run with a local image first
Locally we build for the respective platform and test with it.
Then we build a multi-platform image and push that to the Registry.
Allow a push of the Docker image with additional tags to override or
extend given images.
We only need the timestamped Docker images for some time to enable
testing of certain builds.
and enable optional dry-run (default true)
@ascheman ascheman self-assigned this Oct 10, 2025
Copilot AI review requested due to automatic review settings October 10, 2025 20:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a container-based GitHub Action for HTML Sanity Check, enabling users to run HSC in CI/CD workflows through a published Docker image on GHCR. The implementation includes Docker multi-architecture builds, automated publishing, and GHCR cleanup workflows.

  • Adds a new GitHub Action (action.yml) that uses a Docker image ghcr.io/aim42/hsc:v2
  • Implements Docker build infrastructure with multi-arch support (amd64/arm64) and automated publishing to GHCR
  • Introduces automated cleanup workflow to manage old timestamped Docker images in GHCR

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
action.yml New GitHub Action definition using Docker image with HSC entrypoint
htmlSanityCheck-cli/Dockerfile Multi-stage Docker image based on Eclipse Temurin JRE Alpine
htmlSanityCheck-cli/build.gradle Docker plugin configuration with multi-arch builds and tagging strategy
htmlSanityCheck-cli/hsc.sh Shell entrypoint script for Docker container
htmlSanityCheck-cli/src/main/groovy/org/aim42/htmlsanitycheck/cli/HscCommand.groovy Added --fail-on-errors CLI option
.github/workflows/gradle-build.yml Added Docker image publishing and GitHub Action testing jobs
.github/workflows/cleanup-ghcr.yml New workflow for cleaning up old GHCR image versions
Multiple files Updated regex patterns to properly escape dots in URL exclusions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions
Copy link

github-actions bot commented Oct 10, 2025

Test Results

123 files  ±0  123 suites  ±0   10m 4s ⏱️ - 2m 32s
487 tests ±0  485 ✅ ±0  0 💤 ±0   2 ❌ ±0 
627 runs  ±0  562 ✅ ±0  0 💤 ±0  65 ❌ ±0 

For more details on these failures, see this check.

Results for commit 689d304. ± Comparison against base commit 9efca5e.

This pull request removes 60 and adds 23 tests. Note that renamed tests count towards both.
                                <a href="http://aim.org">improve</a>' 
                                <a href="http://arc42.de">arc42.de</a> and some more text
                                <a href="https://arc42.org">arc42 over https</a> even more
                                <a href="local-file.jpg">local file</a> again, text
                                <img src="" alt="2">
                                <img src="t.doc" alt="r"> '
    <area shape="circle" coords="0,1,1" href="#test2">
    <area shape="rect" coords="0,0,1,1" href="#id1" >
    <area shape="rect" coords="0,0,1,1" href="#test1" >
    <area shape="rect" coords="0,0,1,1" href="#test1">
…
org.aim42.htmlsanitycheck.check.ImageMapsCheckerSpec ‑ find image map issues [nrOfFindings: 1, imageMapStr: <img src="image1.jpg" usemap="#map1"><map name="map1">
    <area shape="rect" coords="0,0,1,1" href="#id1" >
</map>
<h2 id="foo" >bad header</h2>, msg: ImageMap "map1" refers to missing link "id1"., #4]
org.aim42.htmlsanitycheck.check.ImageMapsCheckerSpec ‑ find image map issues [nrOfFindings: 1, imageMapStr: <img src="image1.jpg" usemap="#map1"><map name="map1">
    <area shape="rect" coords="0,0,1,1" href="#id1" >
</map>
<map name="map1">
    <area shape="rect" coords="0,0,1,1" href="#id1" >
</map>
<h2 id="id1">aim42 header</h2>, msg: 2 imagemaps with identical name "map1" exist., #1]
org.aim42.htmlsanitycheck.check.ImageMapsCheckerSpec ‑ find image map issues [nrOfFindings: 1, imageMapStr: <img src="image1.jpg" usemap="#map1"><map name="map1">
</map>
, msg: ImageMap "map1" has no area tags., #3]
org.aim42.htmlsanitycheck.check.ImageMapsCheckerSpec ‑ find image map issues [nrOfFindings: 1, imageMapStr: <map name="map1">
    <area shape="rect" coords="0,0,1,1" href="#id1" >
</map>
<h2 id="id1">aim42 header</h2>, msg: ImageMap "map1" not referenced by any image., #2]
org.aim42.htmlsanitycheck.html.HtmlPageSpec ‑ can extract alt attributes from imageTag ' <img alt="1" >
                                <img src="" alt="2">
                                <img src="t.doc" alt="r"> '
org.aim42.htmlsanitycheck.html.HtmlPageSpec ‑ detect correct number of external http links in anchors '<a href="http://arc42.org">arc42</a> and some text
                                <a href="http://arc42.de">arc42.de</a> and some more text
                                <a href="https://arc42.org">arc42 over https</a> even more
                                <a href="local-file.jpg">local file</a> again, text
                                <a href="http://aim.org">improve</a>' 
org.aim42.htmlsanitycheck.html.HtmlPageSpec ‑ detect missing alt attributes in imageTag ' <img alt="1" >
                                <img src="" alt="2">
                                <img src="t.doc" alt="r"> '
org.aim42.htmlsanitycheck.html.ImageMapParserSpec ‑ find all areas within map [nrOfAreas: 1, mapName: mymap, htmlBody: <img src="image.gif" usemap="#mymap">
<map name="mymap">
    <area shape="rect" coords="0,0,1,1" href="#test1" >
</map> , #1]
org.aim42.htmlsanitycheck.html.ImageMapParserSpec ‑ find all areas within map [nrOfAreas: 2, mapName: mymap, htmlBody: <img src="image.gif" usemap="#mymap">
<map name="mymap">
    <area shape="rect" coords="0,0,1,1" href="#test1" >
    <area shape="circle" coords="0,1,1" href="#test2">
</map> , #0]
org.aim42.htmlsanitycheck.html.ImageMapParserSpec ‑ find all hrefs within map [nrOfHrefs: 1, mapName: mymap, htmlBody: <img src="image.gif" usemap="#mymap">
<map name="mymap">
    <area shape="rect" coords="0,0,1,1" href="#test1" >
</map> , hrefs: [#test1], #1]
…

♻️ This comment has been updated with latest results.

ascheman and others added 2 commits October 12, 2025 08:57
Building a multi-platform Docker image only works in cooperation with
a remote registry, which implies a push. The image is not directly
available in the local image store. A pull is necessary to make it
locally available.
For the GitHub Action test it was necessary to use the Git SHA as unique
identifier for the system to be tested (its testing the Docker image
which is used as GitHub action). Other identifiers provided no clear
distinction as there could be other/older images with the same tag
in the remote registry.
@ascheman ascheman force-pushed the feature/369-gh-action branch from a51b419 to 675d6d7 Compare October 12, 2025 07:22
On GH pull requests the `git branch --show-current` does not return the
branch name. In this case we use the SHA tag (only) for Docker tagging.
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make htmlSC available as GitHub Action

2 participants