Skip to content

Improve hc_matchfinder_longest_match() performance on Apple Silicon #284

@andrews05

Description

@andrews05

I was comparing performance on two MacBooks and was surprised at some of the results.
I used the included benchmark program to compare the following hardware:
2015 MacBook Pro: 2.2 GHz Quad-Core Intel Core i7
2021 MacBook Pro: 8-core Apple M1 Pro

Level Intel M1 Pro Difference
1 70ms 43ms 63%
2 107ms 59ms 81%
3 103ms 60ms 72%
4 108ms 61ms 77%
5 112ms 64ms 75%
6 120ms 74ms 62%
7 157ms 110ms 43%
8 316ms 270ms 17%
9 443ms 429ms 3%
10 2062ms 1340ms 54%
11 4481ms 3142ms 43%
12 9192ms 7309ms 26%

At level 9, M1 Pro is only 3% faster than a 6-year older Intel!
I tried profiling it with xctrace and as best I can tell the performance hit comes from load_u32_unaligned (I can attach the trace output if that would be helpful). I can confirm that UNALIGNED_ACCESS_IS_FAST is set, but beyond that I haven't been able to work out why there's an issue.
Do you have any ideas, or is it really the case that the Intel hardware is simply better at this?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions