Feedback on utility of a linker as a library #333

indygreg · 2025-01-25T21:32:04Z

indygreg
Jan 25, 2025

Thank you for starting this project!

For years I've harbored a desire to implement a performant linker in pure Rust. I'm super excited someone is doing this work. And wild so far looks very promising.

I wanted to send unsolicited feedback about an aspect of linker design that often gets overlooked: the utility of having a linker as a library or linking as an API.

Most modern linkers (including mold and currently libwild) are designed around a command line interface. This is fine: this is the common way linkers are invoked, of course.

But I also believe that constraining linkers to a CLI first interface is constraining utility. I believe there's a whole world of alternative development workflows and ways of patching and assembling executable objects that could be unlocked if low-level linker primitives were usable as an API and could be embedded in a larger component.

As an example, I started the PyOxidizer project (implemented in Rust) to make it simpler to assemble self-contained and distributable Python projects. The Python distributions (https://github.com/astral-sh/python-build-standalone) I built shipped the raw object files constituting libpython and PyOxidizer supported cherry picking the subset of object files your application needed so the final executable asset wouldn't waste space on unused functionality. To produce the final executable we had to invoke a linker. This was somewhat brittle because we relied on whatever was installed on the system. I would have much preferred to embed a Rust crate providing a linker so we could perform the link in-process with no dependencies on system software.

As another example, I've implemented a pure Rust implementation of Apple's code signing mechanism. Signing Mach-O binaries entails manipulating Mach-O binaries to embed the code signature. I've implemented a very crude specialized linker to do this. But, again, I would have much preferred if I could have embedded a Rust linker crate and performed the Mach-O mutations with it instead of having to invent this wheel.

I have a handful of projects that make use of the https://github.com/NixOS/patchelf tool for rewriting ELF binaries. I've stumbled across several bugs in patchelf over the years where it corrupts ELF binaries. This type of corruption is easily avoidable when you solve ELF rewriting as "relinking" instead of "patching." Again, having a linker as a library where I can reach for the low-level APIs to reconstitute an object file would facilitate implementing this functionality robustly.

I've long harbored ideas of stealing Zig's excellent ideas around glibc symbol version targeting to make it easier to produce ELF binaries that are compatible with more systems. Having a generic linker library can provide building blocks to make solving this problem easier.

The Rust core development loop is often maligned as being bottlenecked on compiling and linking. There are no doubt optimizations that can be performed if the coupling between the compiler and linker is tighter and more cohesive. For example, the linker can reserve extra padding to facilitate swapping new code without rewriting the entire file. Along the same vein is hot code swapping on a running executable. These optimizations are vastly easier to implement when you can embed a linker and use it as a deeply customizable library.

I could go on.

The libwild crate today has a very limited API. Most functions are marked pub(crate). The core linker interface is based on parsing arguments to a struct.

I'd like to encourage opening up that API. Make as much as possible pub. Break API compatibility as much as you want. I don't care about the churn: having the ability to embed a powerful and performant linker in Rust opens up a world of possibilities. It encourages people to rethink the paradigm of linking. The industry is still using linkers as a discrete post-compile step because that's how linkers have been designed for decades. I'm willing to bet that if someone is able to build a linker as a library a ton of people will find new and creative ways to employ linking to solve various problems. I think wild and the Rust ecosystem are in position to unlock this potential.

davidlattimore · 2025-01-26T00:38:24Z

davidlattimore
Jan 26, 2025
Maintainer

Thanks for the feedback!

I'm definitely open to exposing more bits of libwild as public, but I'd kind of like that to be driven by actual use-cases, so that we can come up with APIs that make sense for those use-cases.

The idea of using wild as a library, but interfacing via a list of command-line arguments is that, were wild to be built into a compiler, that compiler likely needs the ability to pass arbitrary linker arguments though to the linker. For example, C compilers support the -Wl,--some-linker-arg and rustc supports -Clink-arg=--some-linker-arg. The only way it can really do this is if the interface is via a list of arguments.

The use-case of patching an arbitrary ELF executable or shared object is very difficult. You can't just pass it to the linker and relink it, since lots of the information from the original object files is lost. The post-link optimiser, bolt does this, but it is incredibly complex and requires a bunch of heuristics to try to re-obtain information that was lost when linking. The main thing that is lost is the original relocations. The linker decides on a layout for all the input sections, then when writing, it applies the relocations from the original object files. Once applied, those relocations are discarded. This makes it very hard to move stuff around. Even if you disassemble the executable code, if you see an instruction that's loading an IP-relative address, there could be multiple symbols at that address - e.g. you can have one symbol that points to the end of a section that just happens to be the same address as another symbol that points to the start of the next section. If you've moved one of these two sections, you pretty much can't know whether you need to update that instruction or not. I'm not sure how bolt handles this kind of situation. My guess / hope is that when it detects ambiguity, it just refuses to move either of the sections. Or maybe it's less of a problem for bolt, since presumably it's mostly moving functions around, which shouldn't suffer from that problem. Patching of non-position-independent executables is even more error prone, since you can't really tell the difference between a pointer and a value without tracing through subsequent instructions to see how it's used. The kinds of problems that need to be solved in order to do this are pretty distinct from the problems that a linker needs to solve, so I feel like a library like patchelf is best to solve them. The fact that it can in some cases results in a bad binary is either a bug or just a downside of this being a pretty much impossible problem to get 100% right in all cases - although ideally the library could detect when it can't figure something out and fail with a helpful error message rather than making a bad edit.

Incremental linking and hot code reloading are definitely on the roadmap for Wild, but the plan is that they'd be core functions in Wild, rather than things that lived in a separate crate that just used Wild as a library. They require a lot of bookkeeping - keeping track of where stuff came from, where it was put, what references what etc. It's only with all this information that we'd be able to reliably and quickly make incremental updates to the output binary.

0 replies

mati865 · 2025-04-11T17:54:16Z

mati865
Apr 11, 2025
Collaborator Sponsor

When integrating Wild with Rust for benchmarking purposes, I've found it annoying that dependabot also bumps the versions in Cargo.toml. This is a complicating factor for projects with higher inertia in terms of dependency upgrades.

5 replies

davidlattimore Apr 12, 2025
Maintainer

Are you saying that you'd like to stick to particular older versions of certain creates that match what say rustc is using or just that you'd like less regular dependency updates?

I don't think it's essential that we're on the latest versions of packages, so don't object to adjustments to the way dependabot is configured or even turning it off if need be

davidlattimore Apr 12, 2025
Maintainer

Thinking about this some more, perhaps what we want is to set versioning-strategy.

If our versions in cargo.toml then reflect out minimum supported versions then we may at some point want to have a CI job that builds and tests with those minimum versions.

mati865 Apr 12, 2025
Collaborator Sponsor

Are you saying that you'd like to stick to particular older versions of certain creates that match what say rustc is using or just that you'd like less regular dependency updates?

I'd like to be able to stick with older versions, I think libraries should not force you to use latest dependencies unless necessary.

I don't think it's essential that we're on the latest versions of packages, so don't object to adjustments to the way dependabot is configured or even turning it off if need be

Thinking about this some more, perhaps what we want is to set versioning-strategy.

If our versions in cargo.toml then reflect out minimum supported versions then we may at some point want to have a CI job that builds and tests with those minimum versions.

I'm in favour of keeping latest versions in Cargo.lock and performing the development and benchmarking with them. Just not so much about writing them down into Cargo.toml which forces all downstream users to upgrade everything even when they only want to upgrade a single dependency.

There is a tool for testing that: https://crates.io/crates/cargo-minimal-versions

davidlattimore Apr 15, 2025
Maintainer

I've configured dependabot to only update Cargo.lock and I've also reduced the minimum versions on most of our packages. I used cargo-minimal-versions to help figure out what would work, but haven't as yet integrated it into our CI. It's possible that some of the packages could have even lower minimal versions, since in some cases I just found an older version that worked rather than bisecting to find the absolute oldest. So if there's a specific dependency that you'd like to see on an older version, that might be possible.

mati865 Apr 16, 2025
Collaborator Sponsor

Thanks, for dependencies with relatively stable API like itertools it's common to use version range like here https://github.com/tokio-rs/prost/blob/fcf610edf53826eacd7010a667b7026d5560060f/prost-build/Cargo.toml#L19, but it's not a big deal. This is already much more convenient to use.

teburd · 2025-04-12T11:40:16Z

teburd
Apr 12, 2025
Sponsor

Linker as a library would be amazing, particularly if you could get it with no_std for embedded elf loading in the context of a microcontroller or OS. Linux has what amounts to its own linker in the kernel as do many RTOSes like the one I added to Zephyr. Having a production linker as a library would be amazing.

3 replies

davidlattimore Apr 13, 2025
Maintainer

What would you imagine to be the use-case? I'm guessing the output would bypass ELF and effectively do the equivalent of elf2bin? So as in a library that takes in ELF object files, does layout work, then writes the result directly to flash to be executed.

I'm definitely open to exploring such possibilities, but would probably want them to be driven by a use-case and a user.

I suspect before Wild would be useful for embedded development, we'd probably need to add 32 bit support. That's something I'd be happy to do without an actual user in mind.

teburd Apr 16, 2025
Sponsor

The no_std use case I had in mind is like loadable modules for the Linux kernel. Linking relocatable ELFs together basically with the running program. Dealing with all the relocations and matching symbols up.

davidlattimore Apr 16, 2025
Maintainer

This sounds like a runtime linker. While there's a little bit of overlap between what a runtime linker does and what a compile-time linker does, the overlap isn't perfect. In general, a compile-time linker does a lot more than a runtime linker. So I suspect, especially for embedded purposes, you wouldn't want to pay the cost of having all the extra complexity needed by a compile-time linker when all you're doing is runtime linking.

One difference is that a runtime linker (or loader) loads the shared object as a whole and doesn't move bits around, whereas a compile-time linker takes the individual sections and moves them around, placing them according to alignment and other constraints.

A compile-time linker needs to both read and write relocations. A runtime linker only reads relocations and applies them.

A compile-time linker needs to generate quite a bit of stuff - e.g. GOT, PLT, symbol tables etc. A runtime linker doesn't generate anything, it just loads what the compile-time linker produced.

Compile-time linkers perform various optimisations (aka relaxations) that transform relocations and machine code. A runtime linker doesn't do any of these.

I agree that a no-std runtime linker would be useful, but it's probably better if it doesn't carry around the baggage of a whole compile-time linker, most of which would be of no use to it.

Uh oh!

Feedback on utility of a linker as a library #333

Uh oh!

indygreg Jan 25, 2025

Replies: 3 comments · 8 replies

Uh oh!

davidlattimore Jan 26, 2025 Maintainer

Uh oh!

mati865 Apr 11, 2025 Collaborator Sponsor

Uh oh!

davidlattimore Apr 12, 2025 Maintainer

Uh oh!

Uh oh!

davidlattimore Apr 12, 2025 Maintainer

Uh oh!

mati865 Apr 12, 2025 Collaborator Sponsor

Uh oh!

davidlattimore Apr 15, 2025 Maintainer

Uh oh!

mati865 Apr 16, 2025 Collaborator Sponsor

Uh oh!

teburd Apr 12, 2025 Sponsor

Uh oh!

davidlattimore Apr 13, 2025 Maintainer

Uh oh!

teburd Apr 16, 2025 Sponsor

Uh oh!

davidlattimore Apr 16, 2025 Maintainer

indygreg
Jan 25, 2025

Replies: 3 comments 8 replies

davidlattimore
Jan 26, 2025
Maintainer

mati865
Apr 11, 2025
Collaborator Sponsor

davidlattimore Apr 12, 2025
Maintainer

davidlattimore Apr 12, 2025
Maintainer

mati865 Apr 12, 2025
Collaborator Sponsor

davidlattimore Apr 15, 2025
Maintainer

mati865 Apr 16, 2025
Collaborator Sponsor

teburd
Apr 12, 2025
Sponsor

davidlattimore Apr 13, 2025
Maintainer

teburd Apr 16, 2025
Sponsor

davidlattimore Apr 16, 2025
Maintainer