Skip to content

feat: Adding Dockerfile for stats and CI for building container images#24

Merged
malteos merged 17 commits intocommoncrawl:masterfrom
malteos:feat/docker-ci
Dec 3, 2025
Merged

feat: Adding Dockerfile for stats and CI for building container images#24
malteos merged 17 commits intocommoncrawl:masterfrom
malteos:feat/docker-ci

Conversation

@malteos
Copy link
Contributor

@malteos malteos commented Nov 28, 2025

This PR adds a Dockerfile for the stats workflow including a GitHub action that runs the unit tests, builds the images (stats and site), and pushes the images to the GitHub registry (fix for #20).

The new plots were generated with the container run command from the README but with the image from the feature branch:

docker pull ghcr.io/malteos/cc-crawl-statistics/stats:feat-docker-ci

@malteos malteos requested a review from damian0815 November 28, 2025 10:31
Copy link

@damian0815 damian0815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the background color of the images is transparent - is that intended?

Copy link

@damian0815 damian0815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise,, very nice

@malteos
Copy link
Contributor Author

malteos commented Dec 1, 2025

The transparent background is coming from the ggplot minimal theme (probably due to a new version)

See #23 (comment)

Since the background on the website (https://commoncrawl.github.io/cc-crawl-statistics/plots/tld/by-year-and-continent.html) is white, it doesn't matter.
I have no explanation for it, except a different ggplot2 version used to generate the plot.

Or tidyverse/ggplot2#4919 (comment)

I tried to set the background explicitly but it doesn't work :(

@malteos
Copy link
Contributor Author

malteos commented Dec 1, 2025

There are still minor differences (font size and colors in url status by year).

@malteos
Copy link
Contributor Author

malteos commented Dec 1, 2025

Font size, line widths and colors of url status by year are now adjusted. Only plots/tld/groups.png has a different color assignment but this should not be relevant (probably due to inconsistent ordering of dict keys).

@malteos
Copy link
Contributor Author

malteos commented Dec 1, 2025

Only plots/tld/groups.png has a different color assignment but this should not be relevant (probably due to inconsistent ordering of dict keys).

@sebastian-nagel is this critical or can we just merge this?

@sebastian-nagel
Copy link
Contributor

is this critical or can we just merge this?

No. It's not critical.

@malteos malteos merged commit 23ecf3e into commoncrawl:master Dec 3, 2025
2 checks passed
@malteos malteos deleted the feat/docker-ci branch December 3, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants