Skip to content

Faster decode and find_max_char implementations. #120196

@rhpvorderman

Description

@rhpvorderman

Feature or enhancement

Proposal:

I have spotted a few inefficiencies in the stringlib implementations that hinder the compilers ability to optimize the code. These could be fixed.

  • find_max_char, the 1-byte version. This unrolls checking 4 or 8-byte chunks. Alignment (which does not matter for x86-64 but may be important on other platforms) happens by checking one character at the time. This can be sped up by simply bitwise OR-ing all the characters together, and only check all the alginments with one check. Furthermore, the loop can be unrolled using 32-byte chunks. (4 size_t integers). By doing so, the compiler needs only very few extra instructions to do the bitwise or and can use 16-byte vectors. These are available on both x86-64 and ARM64 and the compiler will optimize easily. The less than 32 byte remainder can then be obtained by simply bitwise OR-ing these characters together and perform the check.
  • Find_max_char, the 2-byte and 4-byte version. These now work with unrolls of 4. For the 2-byte version this means an 8-byte load. Increasing the unroll to 8, this means 16-byte and 32-byte loads. The compiler can vectorize this.
  • Stringlib codecs.h utf8_decode on line 47 states, fast unrolled copy. These statements can be replaced by memcpy(*_p, *_s, SIZEOF_SIZE_T); Using *restrict` the compiler should understand that a read does not need to be performed twice, and memcpy using a fixed size is always optimized out.
  • ascii_decode: same as find_max_char. This can be optimized using larger chunks.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions