Skip to content

C API Artifacts getting too large for redistribution #11476

@alexcrichton

Description

@alexcrichton

The current motivation for this issue is that the wasmtime-go repository cannot make a 36.0.0 release because the C API artifacts have gotten too large. A rough overview of what's happening there is that the only way I know of to distribute the C API in Go is to literally check the binaries themselves (for all architectures no less) into the repository at the published tags. GitHub has a 100M file size limit for files in git repositories Wasmtime has been steadily growing large and larger over time (as GitHub prints warnings when I push the tag on the CLI) and the x86_64 linux C API is now above this threshold.

On one hand this is expected, we slowly add things over time and binaries are expected to get larger. This is why we have conditional compile-time features to reduce the size of artifacts. In the wasmtime-go repository, though, it's trying to reuse the exact artifacts that Wasmtime publishes as I'd really rather avoid duplicating the build process just for a slightly reduced feature set. Thus, this issue.

If we accept this as an issue that should be addressed, then the question remains of what to do about it. We've got a well-documented series of steps of how to build a minimal embedding of Wasmtime which gives a whole bunch of knobs of how to shrink the size of the compiled artifact, and it's a question of how best to apply them.

Why is libwasmtime.a big?

The current state of the world is that wasmtime-go checks in libwasmtime.a static archives. It avoids *.so or dynamic library artifacts to avoid adding new dependencies to executables produced (I don't even know how we'd communicate to users how to copy around the library). The libwasmtime.a artifact for x86_64-linux is 104M. The libwasmtime.so artifact, however, is only 26M. I would consider the libwasmtime.so artifact the "lower bound" of what we would ideally be able to achieve.

So first off: why is libwasmtime.so so much smaller? The main reason for this is that *.a is just a huge collection of *.o files internally. These object files are not all 100% used in the final artifact (e.g. --gc-sections would remove a lot) and they're just raw copied from rlibs. The *.so is much smaller as it's a fully linked artifact that has all dead code removed. There's also "linking metadata" such as relocation sections, big symbols tables, llvm bitcode, etc, which is all present in *.a but not in *.so.

Shrinking libwasmtime.a

Ok that's enough intro, what to do about this

  • Turn on LTO. This cuts the *.a size in half because it's effectively DCE at compile time. The optimization side of things ends up making it really slow though. Testing in Test out building the C API artifacts with LTO #11475 the Windows release builds clock in at 20m where they were previously 15, which would make them the slowest builder. (not a problem for Linux/macOS builds though...)
  • Turn on ThinLTO. Similar to above but faster as it's more parallelizable. Only shaves ~10-15M locally though so not really enough to buy much runway.
  • Turn off some features. This is here for completeness but I don't think this is really viable since the point of these artifacts is to be the "default featureful artifacts".
  • Turn on panic=abort. Alas this is already done.
  • Turn on opt-level=s or z. Measured locally to basically have no effect (this is good for micro-optimizing, but bad for macro-optimizing in my experience)
  • Use CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1. Shaves 20-30M off locally. (possible!)
  • Some how use the -r flag to ld. This is in theory "apply --gc-sections ahead of time" AFAIK, but I've never used this option successfully nor have I seen anyone else use it successfully. Trying naively locally requires knowledge of symbols which we do not have.

Well I at least wanted to open an issue about this. I don't think much will come of opening the issue here, but maybe someone can swoop in with That One Easy Trick which makes the binary tiny.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions