Skip to content

Comments

Implement hermetic symbol renaming for hyperscan fat runtime#3675

Closed
Copilot wants to merge 9 commits intocopilot/implement-fat-runtime-hyperscanfrom
copilot/make-hyperscan-symbol-renaming-hermetic
Closed

Implement hermetic symbol renaming for hyperscan fat runtime#3675
Copilot wants to merge 9 commits intocopilot/implement-fat-runtime-hyperscanfrom
copilot/make-hyperscan-symbol-renaming-hermetic

Conversation

Copy link
Contributor

Copilot AI commented Jan 31, 2026

Hyperscan's fat runtime builds 5 architecture-specific libraries (core2, corei7, avx2, avx512, avx512vbmi) that must coexist without symbol conflicts. The existing genrule used non-hermetic system tools and failed in remote execution.

Changes

Starlark rule using cc_toolchain

  • New hs_rename_symbols rule extracts LLVM tools via find_cc_toolchain() and cc_common.get_tool_for_action()
  • Accepts cc_library targets through CcInfo provider, extracts static library from linking context
  • Derives llvm-nm/llvm-objcopy paths from toolchain's llvm-ar location

Script execution fixes

  • Tool/file paths passed as arguments via ctx.actions.args(), converted to absolute with realpath before cd
  • Bash variables: ${VAR} not ${{VAR}}
  • llvm-nm: --format=posix not -f p
  • Symbol map: awk pipeline ensures proper old_symbol new_symbol format

Symbol preservation

  • Keep list includes hyperscan allocation hooks, libc symbols (malloc, pthread_*), and critically ^mmbit_
  • mmbit_* symbols defined in un-renamed hs_common library but referenced from renamed variant libraries - must not be renamed to avoid undefined symbols at link time
# Script accepts absolute tool paths, prevents renaming cross-library symbols
AR="$(realpath "$1")"
"${NM}" --format=posix -g "${obj}" | awk '{print $1}' | grep -v -f "${KEEPSYMS}" | \
    awk -v prefix="${PREFIX}" '{print $1 " " prefix "_" $1}' > "${SYMSFILE}"

Removes rename_symbols.sh, updates source.json overlay hashes.

Original prompt

Make hyperscan fat runtime symbol renaming hermetic

Problem

The current implementation in PR #3672 uses ar, nm, and objcopy directly from the host system in rename_symbols.sh. This is not hermetic - it should use the tools from the configured C++ toolchain (e.g., llvm-ar, llvm-nm, llvm-objcopy from envoy's LLVM toolchain).

Additionally, the upstream CMake build has a clever trick that's missing: it queries the compiler for libc's location and excludes all libc symbols from renaming to avoid conflicts.

Solution

1. Replace the shell script + genrule with a proper Starlark rule

Create hs_rename_symbols rule in fat_runtime.bzl that:

  • Uses find_cpp_toolchain(ctx) to get the configured toolchain
  • Gets tool paths via cc_toolchain._tool_paths for ar, nm, objcopy
  • Passes toolchain files via cc_toolchain.all_files as inputs
  • This ensures hermetic builds using the same LLVM tools envoy uses

2. Add comprehensive symbol keep list

Since querying the sysroot's libc isn't portable (assumes glibc, requires sysroot), instead add common libc/compiler symbols to the keep list as static patterns. This is more portable and predictable.

Expand the keep.syms patterns to include:

  • The existing upstream patterns (hs_alloc, hsfree, ^)
  • Common libc symbols that could conflict (malloc, free, memcpy, memset, etc.)
  • Compiler intrinsics patterns

3. Implementation Details

fat_runtime.bzl changes:

load("@rules_cc//cc:find_cc_toolchain.bzl", "find_cpp_toolchain", "use_cc_toolchain")

# Keep symbols - from upstream keep.syms.in plus common libc symbols
KEEP_SYMBOLS = """
# Hyperscan allocation hooks (from upstream keep.syms.in)
hs_misc_alloc
hs_misc_free
hs_free_scratch
hs_stream_alloc
hs_stream_free
hs_scratch_alloc
hs_scratch_free
hs_database_alloc
hs_database_free
# Internal/compiler symbols starting with underscore
^_
# Common libc symbols to avoid renaming
^malloc$
^free$
^calloc$
^realloc$
^memcpy$
^memmove$
^memset$
^memcmp$
^strlen$
^strcmp$
^strncmp$
^strcpy$
^strncpy$
^printf$
^fprintf$
^sprintf$
^snprintf$
^abort$
^exit$
^pthread_
"""

def _hs_rename_symbols_impl(ctx):
    cc_toolchain = find_cpp_toolchain(ctx)
    
    # Get tool paths from toolchain
    ar_path = cc_toolchain._tool_paths.get("ar")
    nm_path = cc_toolchain._tool_paths.get("nm")
    objcopy_path = cc_toolchain._tool_paths.get("objcopy")
    
    if not ar_path or not nm_path or not objcopy_path:
        fail("Toolchain must provide ar, nm, and objcopy tools")
    
    # ... implementation that uses these tools ...

hs_rename_symbols = rule(
    implementation = _hs_rename_symbols_impl,
    attrs = {
        "archive": attr.label(mandatory = True, allow_single_file = [".a"]),
        "prefix": attr.string(mandatory = True),
        "_cc_toolchain": attr.label(default = "@bazel_tools//tools/cpp:current_cc_toolchain"),
    },
    toolchains = use_cc_toolchain(),
    fragments = ["cpp"],
)

Update hs_exec_variant macro to use the new rule instead of genrule + shell script.

Remove rename_symbols.sh - the logic moves into the Starlark rule.

Files to modify

  • bazel-registry/modules/hyperscan/5.4.2.envoy/overlay/fat_runtime.bzl - Replace genrule with proper Starlark rule using cc_toolchain
  • bazel-registry/modules/hyperscan/5.4.2.envoy/overlay/rename_symbols.sh - Remove (logic moves to Starlark)
  • bazel-registry/modules/hyperscan/5.4.2.envoy/overlay/BUILD.bazel - Remove exports_files for rename_symbols.sh
  • bazel-registry/modules/hyperscan/5.4.2.envoy/source.json - Update hashes

Testing

After changes, the build should:

  1. Use llvm-ar, llvm-nm, llvm-objcopy from the configured toolchain
  2. Work in hermetic remote build environments
  3. Properly rename symbols while preserving allocation hooks and libc symbols

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 31, 2026 18:40
- Replace genrule with Starlark hs_rename_symbols rule using cc_toolchain
- Add comprehensive symbol keep list including libc symbols
- Remove rename_symbols.sh shell script
- Update BUILD.bazel and source.json hashes

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
- Derive llvm-nm and llvm-objcopy paths from llvm-ar path
- Fix sed command to generate proper rename map for objcopy
- Update source.json hash for fat_runtime.bzl

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title [WIP] Make hyperscan fat runtime symbol renaming hermetic Make hyperscan fat runtime symbol renaming hermetic Jan 31, 2026
Copilot AI requested a review from phlax January 31, 2026 18:44
- Change archive attribute from allow_single_file to providers = [CcInfo]
- Extract static library from CcInfo linking context
- Update source.json with correct hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI and others added 2 commits January 31, 2026 18:55
- Change script to accept tool paths as arguments instead of hardcoding
- Pass tool paths via ctx.actions.args() at runtime
- This ensures tools are resolved correctly in remote execution
- Update source.json hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
- Use realpath to convert all paths to absolute before cd
- This ensures tools and files can be found after changing directory
- Fixes: "No such file or directory" errors in remote execution
- Update source.json hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title Make hyperscan fat runtime symbol renaming hermetic Make hyperscan fat runtime symbol renaming hermetic and fix path resolution Jan 31, 2026
- Replace all double curly braces with single in bash variables
- Fixes: "bad substitution" error
- Double braces are for Python string escaping, not bash syntax
- Update source.json hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title Make hyperscan fat runtime symbol renaming hermetic and fix path resolution Make hyperscan fat runtime symbol renaming hermetic Jan 31, 2026
- Change -f p to --format=posix for llvm-nm compatibility
- Use awk instead of cut/sed for more reliable symbol extraction
- Properly format symbol rename map as "old_symbol new_symbol"
- Fixes: "llvm-nm: error: --format value should be one of..."
- Fixes: "llvm-objcopy: error: missing new symbol name"
- Update source.json hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
- Add ^mmbit_ to KEEP_SYMBOLS list
- These symbols are defined in hs_common (not renamed)
- Referenced from variant libraries (which get renamed)
- Prevents renaming mmbit_ references in variant libs
- Fixes undefined symbol errors for mmbit_* at link time
- Update source.json hash

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title Make hyperscan fat runtime symbol renaming hermetic Implement hermetic symbol renaming for hyperscan fat runtime Jan 31, 2026
@phlax phlax closed this Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants