Skip to content

Conversation

@marc-hb
Copy link
Contributor

@marc-hb marc-hb commented Mar 11, 2023

The incremental doc build has become too slow again, unusable for interactive use and "drive-by" doc fixes. Add a new minimal sphinx tag that groups and optionally disables all features unusable for interactive use.

On a 6 years old server with 72 cores and 300G of RAM, this new minimal tag reduces html-fast build times like this:

  • Build from scratch:

    From 5 minutes down to 1 minute

  • Incremental build, one-character .rst change:

    From 35 seconds down to 9 seconds.

@marc-hb
Copy link
Contributor Author

marc-hb commented Mar 11, 2023

This is a recurring regression, see 5284231 (2019) and #36033 (2021).

Maybe there could be a new CI check to prevent this? Something like:

# build from scratch
export SPHINXOPTS='-j auto -t minimal'
make html-fast

# make sure the incremental build stays usable; under 10 seconds
sed -i -e 's/Zephyr/Xephyr/' doc/index.rst
timeout 10 make html-fast

EDIT, submitted in:

doc/conf.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if numfig increases build time, let's just drop its usage and so the option. I think it's used in just a few locations. Regarding the extensions:

  • kconfig: we should introduce tracking of origin Kconfig files, and to mtime comparison on incremental builds. Not immediate as kconfiglib doesn't expose this AFAIK.
  • breathe: it's always been slow and clunky, I doubt it will ever be fixed. IMO, we should have moved to Doxygen alone a long time ago, Sphinx+breathe doesn't scale to big projects like Zephyr. FYI a proposal I made a long time ago [RFC] doc: replace breathe with Sphinx-Doxygen bridge #39702

minimal build should also disable DT bindings docs, so we have a single way of building "fast" docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on getting rid of numfig. In fact, IIRC there are more places where people have manually numbered and referenced their figures than there are occurrences of numfig being used. I would be happy to implement the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my "benchmarking" I found numfig to be the most guilty by far. @kartben if there is consensus (please thumb up or down @kartben's comment) then it would be awesome if you can get rid of it entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breathe: it's always been slow and clunky, I doubt it will ever be fixed.

I once tried to debug an incremental build performance issue in breathe: breathe-doc/breathe#420

I had to give up and I'll never try again. I found the codebase to be one of the most object-obfuscated I ever tried to make sense of - without the corresponding C++ speed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@marc-hb marc-hb Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @kartben you got 3 thumbs up and 0 thumbs down. I would hate you doing and submitting the numfig cleanup and get rejected but... go? Give it a try?

@marc-hb
Copy link
Contributor Author

marc-hb commented Apr 6, 2023

Draft of numfig removal in #56631

@marc-hb marc-hb marked this pull request as draft April 6, 2023 22:13
@marc-hb
Copy link
Contributor Author

marc-hb commented Apr 12, 2023

I upgraded everything and re-measured and the numbers in the description have not changed.

zephyr commit ae23da81f2f5304c7ecc378465b1e264f3ad6f7f
sphinx 6.1.3
breathe 4.35.0

This time I also disabled numfig alone while leaving breathe enabled as an additional datapoint: building from scratch is barely faster but the incremental build time is 14 seconds = somewhere in the middle.

@gmarull
Copy link
Member

gmarull commented Apr 17, 2023

I upgraded everything and re-measured and the numbers in the description have not changed.

zephyr commit ae23da81f2f5304c7ecc378465b1e264f3ad6f7f
sphinx 6.1.3
breathe 4.35.0

This time I also disabled numfig alone while leaving breathe enabled as an additional datapoint: building from scratch is barely faster but the incremental build time is 14 seconds = somewhere in the middle.

Feel free to take this task. Please provide some numbers, a Sphinx issue, etc.

@marc-hb
Copy link
Contributor Author

marc-hb commented May 5, 2023

marc-hb added a commit to marc-hb/zephyr that referenced this pull request May 8, 2023
This reverts commit 4516117.

A git bisect showed that the duration of an incremental build doubled
after this commit enabled `numfig=True`. Measurements shared and
discussed in zephyrproject-rtos#37572, zephyrproject-rtos#55708 and zephyrproject-rtos#56631 confirmed this. Here are yet more
measurements below in two different system configurations building docs
for very recent Zephyr commit b10817b + all `:numref:` removed by
the previous commit. In other words these numbers show the cost of
`numfig=True` _without_ even using `:numref:`.

* Ubuntu 22, 8 cores
  sphinx-build --version 4.3.2

                           numfig=True   numfig=False

- from scratch             7 min 15 s       6 min 40 s

- one-line .rst change           48 s             24s

* Current Arch Linux, 72 cores
  sphinx-build --version 6.2.1

                           numfig=True   numfig=False

- from scratch              5 min 0 s      4 min 50 s

- one-line .rst change           37 s            18 s

Signed-off-by: Marc Herbert <[email protected]>
carlescufi pushed a commit that referenced this pull request May 22, 2023
This reverts commit 4516117.

A git bisect showed that the duration of an incremental build doubled
after this commit enabled `numfig=True`. Measurements shared and
discussed in #37572, #55708 and #56631 confirmed this. Here are yet more
measurements below in two different system configurations building docs
for very recent Zephyr commit b10817b + all `:numref:` removed by
the previous commit. In other words these numbers show the cost of
`numfig=True` _without_ even using `:numref:`.

* Ubuntu 22, 8 cores
  sphinx-build --version 4.3.2

                           numfig=True   numfig=False

- from scratch             7 min 15 s       6 min 40 s

- one-line .rst change           48 s             24s

* Current Arch Linux, 72 cores
  sphinx-build --version 6.2.1

                           numfig=True   numfig=False

- from scratch              5 min 0 s      4 min 50 s

- one-line .rst change           37 s            18 s

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Contributor Author

marc-hb commented May 22, 2023

PR updated and rebased.

minimal build should also disable DT bindings docs,...

Would this be as simple as disabling some sphinx extension?

minimal build should also disable DT bindings docs, so we have a single way of building "fast" docs.

Currently, html-fast == DT_TURBO_MODE=1 == "temporarily stub-out the auto-generated Devicetree bindings documentation" in gen_devicetree_rest.py.

This PR is "faster" than html-fast because it... "breaks" doxygen references, printing a lot of warnings.

I don't how we could unify "fast without any warnings" and "faster with a lot of warnings".

Note the many warnings pollute only the build from scratch. The normal one-line .rst change, compile cycle is not polluted.

qipengzha pushed a commit to qipengzha/zephyr that referenced this pull request May 24, 2023
This reverts commit 4516117.

A git bisect showed that the duration of an incremental build doubled
after this commit enabled `numfig=True`. Measurements shared and
discussed in zephyrproject-rtos#37572, zephyrproject-rtos#55708 and zephyrproject-rtos#56631 confirmed this. Here are yet more
measurements below in two different system configurations building docs
for very recent Zephyr commit b10817b + all `:numref:` removed by
the previous commit. In other words these numbers show the cost of
`numfig=True` _without_ even using `:numref:`.

* Ubuntu 22, 8 cores
  sphinx-build --version 4.3.2

                           numfig=True   numfig=False

- from scratch             7 min 15 s       6 min 40 s

- one-line .rst change           48 s             24s

* Current Arch Linux, 72 cores
  sphinx-build --version 6.2.1

                           numfig=True   numfig=False

- from scratch              5 min 0 s      4 min 50 s

- one-line .rst change           37 s            18 s

Signed-off-by: Marc Herbert <[email protected]>
Copy link
Contributor

@mbolivar-nordic mbolivar-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me but I think a new target would be even better.

The incremental doc build has become too slow again, unusable for
interactive use and "drive-by" doc fixes. Add a new `minimal` sphinx tag
that groups and optionally disables all features unusable for
interactive use.

On a 6 years old server with 72 cores and 300G of RAM, this new
`minimal` tag reduces `html-fast` build times like this:

- Build from scratch:

  From 4.5 minutes down to 1 minute

- Incremental build, one-character .rst change:

  From 16 seconds down to 9 seconds.

Signed-off-by: Marc Herbert <[email protected]>
(cherry picked from commit 2f675be6383b4925f4cf74ca86a3c0d8663482f8)
@marc-hb marc-hb marked this pull request as ready for review May 25, 2023 05:52
Comment on lines +22 to +24
# doc/_build/ first. WARNING: doxygen integration (breathe) will be
# missing, so building from scratch will print a LOT of warnings!
# However this is very useful for small, drive-by .rst fixes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a build with lots of warnings is a good idea. All fast modes we've had until now have produced clean builds, so I think we should do the same here.

Copy link
Contributor Author

@marc-hb marc-hb May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fast modes we've had

Any other mode besides html-fast? Which is... too slow for interactive use.

This new "minimal" mode does not have to be official or documented. I can remove the new target and go back to just the sphinx tag. We just need something usable that does not require a code change.

What alternative(s) do you recommend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fast modes we've had

Any other mode besides html-fast? Which is... too slow for interactive use.

This new "minimal" mode does not have to be official or documented. I can remove the new target and go back to just the sphinx tag. We just need something usable that does not require a code change.

Why not official or documented?

What alternative(s) do you recommend?

Solve the warnings problem, or, try to contribute upstream to improve the extensions that are causing this issue. I also think that if an extension requires us to do such hackery, we should probably reconsider its usage unless it can be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not official or documented?

Because disabling breathe triggers a lot of breathe warnings (which are harmless when you don't care about breathe and just want to make some .rst changes)

try to contribute upstream to improve the extensions that are causing this issue. I also think that if an extension requires us to do such hackery, we should probably reconsider its usage unless it can be fixed.

This PR provides a massive improvement with just one if conditional. It's well tested and working now. These alternatives represent at least months of work for uncertain results. They're not really "alternatives".

... or solve the warnings problem

I could try to filter them out, would that work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with filtering the doxygen and kconfig warnings. The documentation should clearly state that the "minimal" option bypasses all doxygen and kconfig reference generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with filtering the doxygen and kconfig warnings.

Thanks, I will try to find some time and do that in July.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Aug 28, 2023
@github-actions github-actions bot closed this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants