Skip to content

Conversation

@fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 7, 2026

No description provided.

#include "benchmark/benchmark.h"

static std::mt19937 rng;
uint64_t Sum = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Sum a global?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may just be a bit paranoid, to ensure the result of the loops are used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark::DoNotOptimize(Sum); does this already, no?

Comment on lines 96 to 98
__builtin_assume_dereferenceable(AlignedA, Iterations * sizeof(Ty));
__builtin_assume_dereferenceable(AlignedB, Iterations * sizeof(Ty));
__builtin_assume_dereferenceable(AlignedC, Iterations * sizeof(Ty));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the __builtin_assume_aligned and __builtin_assume_dereferenceable necessary? User code almost never uses them, so this takes away from the representativeness of the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are currently necessary for Clang to vectorize this, as the current implementation is limited to cases where we can prove all accesses in the loop are dereferenceable. This will likely be improved in the future, but for now I updated the code to at least check if they are available before using them

Comment on lines 1 to 2
// This program tests performance impact of Epilogue Vectorization
// with varying epilogue lengths, and vector widths.
Copy link
Member

@Meinersbur Meinersbur Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered doxygen comments? However, it might not be applicable on test-suite since we do not run doxygen on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use doxygen comment, as it is a bit more descriptive, thanks!

BENCHMARK_TEMPLATE(BenchFunc, uint64_t) BENCHMARK_ARGS;

REGISTER_BENCHMARK_FOR_TYPES(autovec_no_early_exit_single_load)
REGISTER_BENCHMARK_FOR_TYPES(autovec_early_exit_taken_first_single_load)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why there's no autovec_early_exit_taken_mid_single_load or autovec_no_early_exit_taken_two_loads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an oversight, registered autovec_no_early_exit_taken_two_loads and added def for autovec_early_exit_taken_mid_single_load

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants