Skip to content

Conversation

michael-kerscher
Copy link
Collaborator

Make the mdbook build process parallel. This is a simple approach by just cloning the entire repository into new directories and running the process in parallel. This is a demonstration for #2767.

The process appears to be mostly CPU bound and not memory or disk heavy so the amount of cores on the CI runner matter.
mdbook build does not seem to use more than 1 core and a typical github CI machine has 4 cores. So the expected speedup would be 4x.

Currently 30 minutes are used for publish, 25 minutes of this is build process, this would be reduced to 5 + 6.25 = ~12 minutes, thus more than halving the process.

@michael-kerscher
Copy link
Collaborator Author

Down from 25 minutes to 11 minutes. Less than expected, but also less than half the previous run time

@djmitche
Copy link
Collaborator

djmitche commented Jun 4, 2025

Nice!!

@michael-kerscher
Copy link
Collaborator Author

This now finished (even with missing rust caches!) in 21 minutes. https://github.com/google/comprehensive-rust/actions/runs/15612132409 With caching this would get down quite a lot.

@mgeisler can you review the comprehensive-rust-all artifact if that is the correct structure? This would show that uploading the intermediate steps and artifact downloading with merge works as expected.

The workflow still needs to be refactored a little bit as there is quite some duplication now but this only serves as a POC as of now

@michael-kerscher
Copy link
Collaborator Author

@djmitche @mgeisler any thoughts on the latest approach?

Copy link
Collaborator

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, but I admit I don't know much about GitHub actions so if there are any subtle gotcha's I would not be the person to detect them.

Comment on lines 90 to 97
- name: Update Rust
run: rustup update

- name: Setup Rust cache
uses: ./.github/workflows/setup-rust-cache

- name: Install Gettext
run: |
sudo apt update
sudo apt install gettext
- name: Install mdbook
uses: ./.github/workflows/install-mdbook
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question: do we need to install Rust, mdbook, and the Gettext tools here? I would guess not since everything is built already, no? We're just putting the pieces together and uploading it all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... but we do ;)

We actually only need i18n-report, that is installed in the install-mdbook step cd922c4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changed over time a bit but essentially cargo xtask install-tools installs the necessary tool at this moment. For simplification and POC this is just installed as is. This needs to be refactored

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly if i18n-report requires gettext.

I could upload the .cargo/bin/i18n-report binary in the previous job then download it in all jobs to use use that. This has the potential to be a bit more obscure in the future. But it probably is already pretty obscure where this is from at this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while doing this and already making this a bit more obscure, I could create a tool-building phase that uploads all rust tools so we can upload the entire .cargo/bin/ directory and just use this in every translation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uploads all rust tools so we can upload the entire .cargo/bin/ directory and just use this in every translation

Yes, that would work! Alternatively and perhaps better: we don't need to build things like mdbook, we can download a binary from https://github.com/rust-lang/mdBook/releases.

There is cargo-binstall (https://github.com/cargo-bins/cargo-binstall) which I think can help?

The cool way would then be to publish have a little GitHub action so that it becomes very convenient to use mdbook. We can publish binaries for mdbook-i18n-helpers as well and install them the same way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we need gettext because msgmerge is used in line 69

- name: Build synced translation report
run: |
cp -r po synced-po
MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "granularity": 0}}' mdbook build -d synced-po
for file in synced-po/*.po; do msgmerge --update $file synced-po/messages.pot ; done
i18n-report report book/html/synced-translation-report.html synced-po/*.po

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the research!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this, I would guess we could get a speed-up if i18n-report was moved to a job running in parallel with all the translations.

The final publish job would then become truly simple: it would gather all the HTML from the translations, plus the HTML from i18n-report, and publish it all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to resolve this and then we can potentially move i18n-report later!

@mgeisler
Copy link
Collaborator

mgeisler commented Sep 1, 2025

I like this approach, but I admit I don't know much about GitHub actions so if there are any subtle gotcha's I would not be the person to detect them.

I think this is a great approach, well done, @michael-kerscher! Originaly, we only had a simple mdbook build, which is very fast. But now, I guess the PDF generation is what slows things down dramatically? So it makes sense to do this in parallel.

As for the caching, if building mdboo (and other binaries) takes a long time, we could build them just once, cache them, and reuse them in each job going forward. I forget if we're able to run 20+ jobs in paralle, or if they run, say 4 or 8 at a time? If it's the latter, then building mdbook once could help here

@michael-kerscher michael-kerscher force-pushed the parallel-build branch 3 times, most recently from f2efc52 to 6499784 Compare September 5, 2025 08:29
@michael-kerscher michael-kerscher force-pushed the parallel-build branch 2 times, most recently from 0ff1f44 to 8499f3b Compare September 19, 2025 11:30
# See also TRANSLATIONS.md.

on:
pull_request:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be removed before merging

- main
workflow_dispatch:

permissions:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be re-added before merging

@michael-kerscher michael-kerscher force-pushed the parallel-build branch 2 times, most recently from 9a54440 to 149216f Compare September 19, 2025 11:37
@michael-kerscher
Copy link
Collaborator Author

With #2915 merged, I was able to remove the extra build step, creating and downloading artifacts again. The change is now reduced to the split into

  • create-translation: produces a language artifact with the build.sh
  • publish: loads all language artifacts, does the report and publishing of everything.

the previous sequential build of the language is now using individual jobs to do this in parallel and it looks way better.

Current state: For maximum efficiency we still (according to the current builds) need binaries for

  • mdbook-svgbob

  • i18n-report

  • mdbook-linkcheck2 requires some additional thinking: e.g. build a binary in this repository so it is available for cargo binstall.

and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools

@mgeisler
Copy link
Collaborator

and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools

Ah, fun, I had a similar idea today and put #2916 up for review. Looking over the code, I don't see how this can be quicker than downloading things from the (internal) Apt mirrors GitHub maintains.

My thinking with regards to caching is that it should be used when the thing being cached takes a significant amount of time to create. Unpacking binaries from the GitHub cache or from an Apt mirror ought to be comparable in time.

In #2916, I found that most time was spent on updating the man page database... So telling dpkg to not install documentation helped a ton. My guess is that this could be why the GH Action you link to exists.

Comment on lines +20 to +42
language:
- "en"
- "ar"
- "bn"
- "da"
- "de"
- "el"
- "es"
- "fa"
- "fr"
- "id"
- "it"
- "ja"
- "ko"
- "pl"
- "pt-BR"
- "ro"
- "ru"
- "tr"
- "uk"
- "vi"
- "zh-CN"
- "zh-TW"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine (and easy to sort!), but just in case you hadn't thought of it,

language: ["en", "ar", ...]`

would be equivalent.

@michael-kerscher
Copy link
Collaborator Author

and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools

Ah, fun, I had a similar idea today and put #2916 up for review. Looking over the code, I don't see how this can be quicker than downloading things from the (internal) Apt mirrors GitHub maintains.

My thinking with regards to caching is that it should be used when the thing being cached takes a significant amount of time to create. Unpacking binaries from the GitHub cache or from an Apt mirror ought to be comparable in time.

In #2916, I found that most time was spent on updating the man page database... So telling dpkg to not install documentation helped a ton. My guess is that this could be why the GH Action you link to exists.

Exactly that was my thought, installing the packages did something with the content of the packages and processing the documentation. You approach is even better as there is no use for documentation in the CI environment (at least not yet, AI is not embedded deep enough that it requires that documentation :D)

Comment on lines +92 to +94
run: |
sudo apt update
sudo apt install gettext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now use the new apt-get-install action 🎉


- name: Upload artifact
uses: actions/upload-pages-artifact@v4
# TODO: remove this before merging into main. This is just for verification of the build result while developing this workflow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment here so we don't forget.

uses: actions/deploy-pages@v4
name: comprehensive-rust-all
path: book/
# TODO: uncomment the following lines before merging into main. This is removed for development purposes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment to ensure we don't forget.

Copy link
Collaborator

@mgeisler mgeisler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me and I'm happy for it to merge when you feel confident with it too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants