Replies: 3 comments 8 replies
-
Thanks for the feedback! I'm definitely open to exposing more bits of The idea of using wild as a library, but interfacing via a list of command-line arguments is that, were wild to be built into a compiler, that compiler likely needs the ability to pass arbitrary linker arguments though to the linker. For example, C compilers support the The use-case of patching an arbitrary ELF executable or shared object is very difficult. You can't just pass it to the linker and relink it, since lots of the information from the original object files is lost. The post-link optimiser, bolt does this, but it is incredibly complex and requires a bunch of heuristics to try to re-obtain information that was lost when linking. The main thing that is lost is the original relocations. The linker decides on a layout for all the input sections, then when writing, it applies the relocations from the original object files. Once applied, those relocations are discarded. This makes it very hard to move stuff around. Even if you disassemble the executable code, if you see an instruction that's loading an IP-relative address, there could be multiple symbols at that address - e.g. you can have one symbol that points to the end of a section that just happens to be the same address as another symbol that points to the start of the next section. If you've moved one of these two sections, you pretty much can't know whether you need to update that instruction or not. I'm not sure how bolt handles this kind of situation. My guess / hope is that when it detects ambiguity, it just refuses to move either of the sections. Or maybe it's less of a problem for bolt, since presumably it's mostly moving functions around, which shouldn't suffer from that problem. Patching of non-position-independent executables is even more error prone, since you can't really tell the difference between a pointer and a value without tracing through subsequent instructions to see how it's used. The kinds of problems that need to be solved in order to do this are pretty distinct from the problems that a linker needs to solve, so I feel like a library like patchelf is best to solve them. The fact that it can in some cases results in a bad binary is either a bug or just a downside of this being a pretty much impossible problem to get 100% right in all cases - although ideally the library could detect when it can't figure something out and fail with a helpful error message rather than making a bad edit. Incremental linking and hot code reloading are definitely on the roadmap for Wild, but the plan is that they'd be core functions in Wild, rather than things that lived in a separate crate that just used Wild as a library. They require a lot of bookkeeping - keeping track of where stuff came from, where it was put, what references what etc. It's only with all this information that we'd be able to reliably and quickly make incremental updates to the output binary. |
Beta Was this translation helpful? Give feedback.
-
When integrating Wild with Rust for benchmarking purposes, I've found it annoying that dependabot also bumps the versions in |
Beta Was this translation helpful? Give feedback.
-
Linker as a library would be amazing, particularly if you could get it with no_std for embedded elf loading in the context of a microcontroller or OS. Linux has what amounts to its own linker in the kernel as do many RTOSes like the one I added to Zephyr. Having a production linker as a library would be amazing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for starting this project!
For years I've harbored a desire to implement a performant linker in pure Rust. I'm super excited someone is doing this work. And wild so far looks very promising.
I wanted to send unsolicited feedback about an aspect of linker design that often gets overlooked: the utility of having a linker as a library or linking as an API.
Most modern linkers (including mold and currently libwild) are designed around a command line interface. This is fine: this is the common way linkers are invoked, of course.
But I also believe that constraining linkers to a CLI first interface is constraining utility. I believe there's a whole world of alternative development workflows and ways of patching and assembling executable objects that could be unlocked if low-level linker primitives were usable as an API and could be embedded in a larger component.
As an example, I started the PyOxidizer project (implemented in Rust) to make it simpler to assemble self-contained and distributable Python projects. The Python distributions (https://github.com/astral-sh/python-build-standalone) I built shipped the raw object files constituting libpython and PyOxidizer supported cherry picking the subset of object files your application needed so the final executable asset wouldn't waste space on unused functionality. To produce the final executable we had to invoke a linker. This was somewhat brittle because we relied on whatever was installed on the system. I would have much preferred to embed a Rust crate providing a linker so we could perform the link in-process with no dependencies on system software.
As another example, I've implemented a pure Rust implementation of Apple's code signing mechanism. Signing Mach-O binaries entails manipulating Mach-O binaries to embed the code signature. I've implemented a very crude specialized linker to do this. But, again, I would have much preferred if I could have embedded a Rust linker crate and performed the Mach-O mutations with it instead of having to invent this wheel.
I have a handful of projects that make use of the https://github.com/NixOS/patchelf tool for rewriting ELF binaries. I've stumbled across several bugs in patchelf over the years where it corrupts ELF binaries. This type of corruption is easily avoidable when you solve ELF rewriting as "relinking" instead of "patching." Again, having a linker as a library where I can reach for the low-level APIs to reconstitute an object file would facilitate implementing this functionality robustly.
I've long harbored ideas of stealing Zig's excellent ideas around glibc symbol version targeting to make it easier to produce ELF binaries that are compatible with more systems. Having a generic linker library can provide building blocks to make solving this problem easier.
The Rust core development loop is often maligned as being bottlenecked on compiling and linking. There are no doubt optimizations that can be performed if the coupling between the compiler and linker is tighter and more cohesive. For example, the linker can reserve extra padding to facilitate swapping new code without rewriting the entire file. Along the same vein is hot code swapping on a running executable. These optimizations are vastly easier to implement when you can embed a linker and use it as a deeply customizable library.
I could go on.
The libwild crate today has a very limited API. Most functions are marked
pub(crate)
. The core linker interface is based on parsing arguments to a struct.I'd like to encourage opening up that API. Make as much as possible
pub
. Break API compatibility as much as you want. I don't care about the churn: having the ability to embed a powerful and performant linker in Rust opens up a world of possibilities. It encourages people to rethink the paradigm of linking. The industry is still using linkers as a discrete post-compile step because that's how linkers have been designed for decades. I'm willing to bet that if someone is able to build a linker as a library a ton of people will find new and creative ways to employ linking to solve various problems. I think wild and the Rust ecosystem are in position to unlock this potential.Beta Was this translation helpful? Give feedback.
All reactions