[kaizen] CI tweaks by sharder996 · Pull Request #4739 · canonical/multipass

sharder996 · 2026-03-17T18:21:45Z

Some tweaks to things that are running on an automated basis:

Added a rule for merging conflicts when converting coverage reports for TICS
Added retry functionality for when we prune cached vcpkg packages. The growing list of cached packages sometimes causes rate limiting errors.
Stopped the distro-scraper from removing existing data when merging its output with an existing file.
Edited the fedora scraper to get architecture specific images from different mirrors (they are not all hosted in the same place). Factored out a bit of common code while I was at it.

jimporter

This looks good overall to me. Regarding the Fedora scraper changes, it's a shame we have to scrap the Apache-generated HTML directory listings. (I couldn't find any alternatives after a bit of poking around.) That feels like it could be brittle, but I don't know enough about the issue to have a better idea.

Just a couple suggestions below about making things a bit more Pythonic.

jimporter · 2026-03-18T01:19:59Z

tools/distro-scraper/scraper/scrapers/fedora.py

+PRIMARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora/linux/releases"
+SECONDARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora-secondary/releases"


If you added trailing slashes to these, then some of the remaining code would (arguably) be simpler:

Suggested change

PRIMARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora/linux/releases"

SECONDARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora-secondary/releases"

PRIMARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora/linux/releases/"

SECONDARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora-secondary/releases/"

jimporter · 2026-03-18T01:20:30Z

tools/distro-scraper/scraper/scrapers/fedora.py

+        url = f"{PRIMARY_RELEASES_URL}/"
+        text = await self._fetch_text(session, url)


... then you could do this:

Suggested change

url = f"{PRIMARY_RELEASES_URL}/"

text = await self._fetch_text(session, url)

text = await self._fetch_text(session, PRIMARY_RELEASES_URL)

This could also use _fetch_dir_listing to avoid needing to use a regex to parse the HTML.

jimporter · 2026-03-18T01:22:54Z

tools/distro-scraper/scraper/scrapers/fedora.py

-        self.logger.info("Sending HEAD request to %s", url)
+        fedora_arch = ARCH_MAP.get(label, label)
+        base = SECONDARY_RELEASES_URL if label in SECONDARY_ARCHES else PRIMARY_RELEASES_URL
+        images_url = f"{base}/{version}/Cloud/{fedora_arch}/images/"


... and finally this (after adding from urllib.parse import urljoin near the top):

Suggested change

images_url = f"{base}/{version}/Cloud/{fedora_arch}/images/"

images_url = urljoin(base, f"{version}/Cloud/{fedora_arch}/images/")

(urljoin is a bit picky and you need the trailing slash in the base value for this to work.)

jimporter · 2026-03-18T01:52:09Z

tools/distro-scraper/scraper/scrapers/fedora.py

-            raise RuntimeError("No images to determine latest version")
+        text = await self._fetch_text(session, url)
+        # Match href values that are plain filenames (no path separators or query strings)
+        return re.findall(r'href="([^"/?][^"/]*)"', text)


Rather than relying on regexes here, maybe use a real HTML parser? This would require adding bs4 as a dependency and then from bs4 import BeautifulSoup.

Suggested change

return re.findall(r'href="([^"/?][^"/]*)"', text)

doc = BeautifulSoup(text)

# Directory entries are links inside the main <pre> block immediately

# following the "Parent Directory".

entries = (doc.find("pre").find(string="Parent Directory").parent

.find_next_siblings("a"))

return [i.text for i in entries]

This probably then would need some extra filtering for directories vs non-directories (by the callers?). I'm not 100% sure this is worth the effort, but it would help prevent potential future issues if something changes on the Fedora end.

sharder996 · 2026-03-18T03:44:45Z

@jimporter Agree with you on all points. I don't view this code and mission critical or production code so my resiliency standards are lower. FWIW the image source I decided to pull from comes from Fedora's release tooling which I would expect to be a bit more stable.

jimporter

@sharder996 LGTM.

I think there's a delicate balance between making this code bulletproof, since it runs automatically versus keeping it simple. Even though it's not a crisis if this code breaks once in a while, it's never fun (IMO) to deal with flaky automation. On the other hand, lots of extra complexity makes maintenance harder.

There might be things we can do to simplify this code (though I know some of my suggestions added to the boilerplate, if nothing else). It's a tough question though, since simplicity plus robustness probably means we'd need to spend more time thinking about the best way to get both. In any case, these changes are good to go, and I'll spend some time thinking about how to get the right balance (including whether my previous reviews of this code nudged us towards unnecessary complexity).

sharder996 added 5 commits March 17, 2026 12:42

[tics] Resolve gcov merge conflicts

12fa88e

[ci] Add retry action to rate limited endpoint

60eb53b

[scraper] Don't remove existing data

ecf4b3d

[scraper] Fetch arches from separate mirrors

f5f7798

[scraper] Extract out common impl into super class

5576fbe

sharder996 requested review from a team and jimporter and removed request for a team March 17, 2026 20:48

jimporter requested changes Mar 18, 2026

View reviewed changes

[scraper] Apply reviewer suggestions

74d9c9a

jimporter approved these changes Mar 19, 2026

View reviewed changes

sharder996 added this pull request to the merge queue Mar 19, 2026

Merged via the queue into main with commit 5850771 Mar 19, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kaizen] CI tweaks#4739

[kaizen] CI tweaks#4739
sharder996 merged 6 commits intomainfrom
kaizen/ci-tweaks

sharder996 commented Mar 17, 2026

Uh oh!

jimporter left a comment •

edited

Loading

Uh oh!

jimporter Mar 18, 2026

Uh oh!

jimporter Mar 18, 2026

Uh oh!

jimporter Mar 18, 2026

Uh oh!

jimporter Mar 18, 2026 •

edited

Loading

Uh oh!

jimporter Mar 18, 2026

Uh oh!

sharder996 commented Mar 18, 2026

Uh oh!

jimporter left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		PRIMARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora/linux/releases"
		SECONDARY_RELEASES_URL = "https://dl.fedoraproject.org/pub/fedora-secondary/releases"

		url = f"{PRIMARY_RELEASES_URL}/"
		text = await self._fetch_text(session, url)

	url = f"{PRIMARY_RELEASES_URL}/"
	text = await self._fetch_text(session, url)
	text = await self._fetch_text(session, PRIMARY_RELEASES_URL)

	images_url = f"{base}/{version}/Cloud/{fedora_arch}/images/"
	images_url = urljoin(base, f"{version}/Cloud/{fedora_arch}/images/")

-        return re.findall(r'href="([^"/?][^"/]*)"', text)
+        doc = BeautifulSoup(text)
+        # Directory entries are links inside the main <pre> block immediately
+        # following the "Parent Directory".
+        entries = (doc.find("pre").find(string="Parent Directory").parent
+                      .find_next_siblings("a"))
+        return [i.text for i in entries]

Conversation

sharder996 commented Mar 17, 2026

Uh oh!

jimporter left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimporter Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

jimporter Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

jimporter Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

jimporter Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimporter Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sharder996 commented Mar 18, 2026

Uh oh!

jimporter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimporter left a comment •

edited

Loading

jimporter Mar 18, 2026 •

edited

Loading