Skip to content

Conversation

@nordic-krch
Copy link
Contributor

Apparently, iterative macros (FOR_EACH_x, GET_ARG_N, GET_ARGS_LESS_N) requires a lot of preprocessing and when widely used (especially recursively) it can significantly increase compilation time. There are 2 limitations:

  • number of arguments limited to 32 (however it can be increased)
  • FOR_EACH_IDX_x macros are called with reversed, descending order of indexes.

If macros are widely used then compile time is significantly shorter (order of magnitude). For example unit test tests/unit/cbprintf in #31102 compiles in 0.8s with fast version, > 7seconds with the standard one.

@github-actions github-actions bot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test labels Feb 3, 2021
@jukkar
Copy link
Member

jukkar commented Feb 3, 2021

Could the fast version be the default, and user could use the slow version if really needed?

@nordic-krch nordic-krch force-pushed the util_fast_for_each branch 3 times, most recently from 52a7e13 to 6547389 Compare February 3, 2021 10:22
@nordic-krch
Copy link
Contributor Author

nordic-krch commented Feb 3, 2021

@jukkar yes, i guess that would be ok except for FOR_EACH_IDX.. where indexes comes in reverse order, that is not one to one and in many cases user may rely on the assumption that first element from the list gets 0 index.

I will try to do the switch and check if CI accepts. Not sure if there are lists longer than 32 used somewhere (DT?).

@zephyrbot zephyrbot added area: Base OS Base OS Library (lib/os) area: Testsuite Testsuite labels Feb 3, 2021
@mbolivar-nordic
Copy link
Contributor

Not sure if there are lists longer than 32 used somewhere (DT?).

DT doesn't use any FOR_EACH or GET_ARG macros.

@de-nordic
Copy link
Contributor

Could the fast version be the default, and user could use the slow version if really needed?

I am also in favor in making this default and actually renaming macros, this way code that somehow can not compile with FAST macros could still use the old but all other would be forced to use the FAST. Additionally Kconfig options could be added to roll back to using "old" macros in all places.

@pabigot
Copy link
Contributor

pabigot commented Feb 15, 2021

I am also in favor in making this default and actually renaming macros

Agreed.

Additionally Kconfig options could be added to roll back to using "old" macros in all places.

IMO this should not be necessary; I'd be concerned that the non-default behavior wouldn't be adequately tested over all places the macros are used.

@de-nordic
Copy link
Contributor

IMO this should not be necessary; I'd be concerned that the non-default behavior wouldn't be adequately tested over all places the macros are used.

I am concerned that some user code that is not available to us, for testing, may get broken, for some reason; adding such option, to rollback to original macros, would give us time to gather information on such instances and users to adapt new macros.
The original macros may get later deprecated together with the Kconfig option.
Does that make sense?

@pabigot
Copy link
Contributor

pabigot commented Feb 15, 2021

The original macros may get later deprecated together with the Kconfig option.

IMO the renamed original macros should remain available for the cases where we don't have a known upper bound on the number of iterations, as may be the case in future uses in devictree.

Does that make sense?

It does, but I think the cost is too high. We can't promise perfect backwards compatibility, and can't make every case where that's so support opt-out. I think it's fine to require out of tree users to modify their code to use the renamed generic versions if the new ones don't work for them because they need an unbounded limit.

@mbolivar-nordic
Copy link
Contributor

as may be the case in future uses in devictree.

I wouldn't make any decisions that assume DT might use these later. DT actually deliberately avoids using the util_macro.h FOR_EACH macros.

DT's for-each macros use helpers that are generated at build time in devicetree_unfixed.h, like this:

/* in devicetree_unfixed.h */
#define <node_id>_FOREACH_CHILD(fn) fn(<child 0 node_id1>) fn(<child 1 node_id>) [...] fn(<child n-1 node_id>)

/* in devicetree.h */
#define DT_FOREACH_CHILD(node_id, fn) DT_CAT(node_id, _FOREACH_CHILD)(fn)

Similarly, whenever the DT APIs need to index into something, we avoid using GET_ARG and instead just expand each array element in the generated header. For array type properties, that looks like this:

#define <node_id>_P_<property>_IDX_0 1342177280
#define <node_id>_P_<property>_IDX_1 512
#define <node_id>_P_<property>_IDX_2 1342178560
#define <node_id>_P_<property>_IDX_3 768

Then macros like DT_PROP_BY_IDX(node_id, prop, idx) do their job by concatenation and recursive macro expansion.

TL;DR I think it is safe to say the devicetree does not now and will not ever need the FOR_EACH or GET_ARG macros.

The only macro that devicetree.h really relies on that takes a variable number of arguments is MACRO_MAP_CAT. (Remaining uses of UTIL_CAT could easily be switched to the internal DT_CAT macro instead.) Looking around, devicetree.h is the only user of MACRO_MAP_CAT in the tree, so we can always make that a private helper for DT if you need to change the util.h implementation in an incompatible way.

@pabigot
Copy link
Contributor

pabigot commented Feb 16, 2021

If nobody thinks we need the old ones, we can just replace the implementation and beg forgiveness when it breaks somebody.

@nordic-krch
Copy link
Contributor Author

good news is that i've managed to resolve issue with reverse indexing and increased support to 64 arguments. Currently i don't see any behavioral differences. I will replace the implementation then.

@nordic-krch nordic-krch force-pushed the util_fast_for_each branch 2 times, most recently from 869b406 to 741a83a Compare February 17, 2021 10:39
@nordic-krch nordic-krch force-pushed the util_fast_for_each branch 6 times, most recently from ec516b2 to 91b2e25 Compare February 18, 2021 13:34
Recursive macros are more generic but they are very depending for
preprocessor. When they are used extensively they can prolong
compilation even ten times. Replaced them with brute force
implementation for:
- FOR_EACH macros
- GET_N_ARG
- GET_ARGS_LESS_N
- UTIL_LISTIFY
- UTIL_REPEAT

New implementation provides same functionality but it is limited to 64
input arguments. This is not a hard limitation and can be increased
in the future.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
Added test for new macro.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
Extended test to check that multiple arguments are passed in
UTIL_LISTIFY macro.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
Copy link
Contributor

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optional comment because I think what UTIL_LISTIFY does given arguments is a little opaque.

UTIL_LISTIFY(4, INC, 1, 2)

zassert_equal(i, 0 + 1 + 2 + 3, NULL);
zassert_equal(i, 0 + 1 + 2 + 3 + 4 * (1 + 2), NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a little more clear expressed as:

zassert_equal(i, (0 + 1 + 2) + (1 + 1 + 2) + (2 + 1 + 2) + (3 + 1 + 2), NULL);

since that represents the effect of the four calls.

@carlescufi carlescufi merged commit 6f5f0b2 into zephyrproject-rtos:master Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Base OS Base OS Library (lib/os) area: Tests Issues related to a particular existing or missing test area: Testsuite Testsuite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants