-
Notifications
You must be signed in to change notification settings - Fork 47
add in memory linking #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
another question is what to do with |
Hi @alessandrod , have you had a chance to take a look at this draft? I'd appreciate your thoughts on whether the changes work for you and if you like the direction it’s heading in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you could split out the wrapper parts as a separate change? If so, we could merge it fast, and then review just the in-memory part of the code (which I still need to play with and review properly).
src/llvm/context.rs
Outdated
|
||
use crate::llvm::LLVMModuleWrapped; | ||
|
||
pub struct LLVMContextWrapped { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job with adding new wrappers! However, I think we should keep them in the src/llvm/types
folder. And let's just name it LLVMContext
.
src/llvm/memory_buffer.rs
Outdated
prelude::LLVMMemoryBufferRef, | ||
}; | ||
|
||
pub struct MemoryBufferWrapped { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - let's keep it in the types
folder and drop the Wrapped
suffix.
src/llvm/mod.rs
Outdated
unsafe { str::from_utf8(slice::from_raw_parts(ptr as *const c_uchar, len as usize)).unwrap() } | ||
} | ||
|
||
pub unsafe fn set_diagnostic_handler<T: LLVMDiagnosticHandler>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably a good candidate for becoming a method of the LLVMContext
wrapper
gonna take a look today, sorry for the delay! |
src/linker.rs
Outdated
/// Link and generate the output code to file. | ||
pub fn link_to_file( | ||
&self, | ||
inputs: Vec<LinkerInput>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should probably be &[LinkerInput]
src/linker.rs
Outdated
self.create_target_machine()?; | ||
fn link<'ctx>( | ||
&'ctx self, | ||
inputs: Vec<LinkerInput>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, slice?
src/linker.rs
Outdated
|
||
fn link_modules<'ctx>( | ||
context: &'ctx LLVMContextWrapped, | ||
inputs: Vec<LinkerInput>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slice?
src/llvm/module.rs
Outdated
} | ||
|
||
impl<'ctx> LLVMModuleWrapped<'ctx> { | ||
pub unsafe fn get_target(&self) -> *const i8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be as *const c_char
f49caa9
to
22bab7e
Compare
I rebased on the main and squashed to remove useless commits. The I addressed most of the review comments. @vadorovsky: I tried to create a new PR without the in memory linking but it ended just like this, the in memory linking is marginal. I think it doesn't make the review process faster, but if really want as separate PR, I will do it. @alessandrod: taking a slice instead of the |
bab471f
to
e74da34
Compare
src/llvm/types/target_machine.rs
Outdated
} | ||
} | ||
|
||
pub unsafe fn codegen_to_mem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary unsafe?
Thank you for the review comments. To accept the iterator and have a cleaner api I had to introduce another layer for the input. All the other solutions ended with a strange API. @vadorovsky I used |
to check if a file can be opened or not I allocate myself a reader in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! I just finished my crazy month of traveling, so I'll be more responsive from now.
It's looking pretty good, just a couple of API comments
src/linker.rs
Outdated
/// | ||
/// // Link into an in-memory buffer. | ||
/// let out_buf = linker.link_to_buffer( | ||
/// vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: vec not necessary anymore
&self, | ||
inputs: impl IntoIterator<Item = LinkerInput<'i>>, | ||
output_type: OutputType, | ||
export_symbols: &HashSet<Cow<'static, str>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason export_symbols and dump_module are passed here and not as part of
LinkerOptions?
Making inputs and output_type arguments here make sense if we add support for
emitting multiple formats (eg you might want to emit both .ir and .asm for debugging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is because now the Linker
object is reusable, so the symbols to export may change depending on the needs. instead regarding dump_module
, you don't want to overwrite the pre and post optimization files if you call link_
functions multiple times.
for example my use case is something like this:
let linker = Linker::new(LinkerOptions {
// ...
})?;
for item in stuff {
// ...
let export_symbols = todo!(); // calculate export symbols
let dump_module_path = todo!(); // optional dump path
// ...
let result = linker.link_to_buffer(
inputs,
output_type,
&export_symbols,
dump_module_path.as_deref(),
)?;
// ...
}
the symbols I have to export are different for every iteration and I want a different path for each link_*
call in case I have to look at the ir.
I agree that maybe it should be improved, but right now this PR is too big, I think it's better to split in other tasks.
I have few other stuff I want to add but after this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is because now the
Linker
object is reusable, so the symbols to export may change depending on the needs. instead regardingdump_module
, you don't want to overwrite the pre and post optimization files if you calllink_
functions multiple times.for example my use case is something like this:
let linker = Linker::new(LinkerOptions { // ... })?; for item in stuff { // ... let export_symbols = todo!(); // calculate export symbols let dump_module_path = todo!(); // optional dump path // ... let result = linker.link_to_buffer( inputs, output_type, &export_symbols, dump_module_path.as_deref(), )?; // ... }the symbols I have to export are different for every iteration and I want a different path for each
link_*
call in case I have to look at the ir.
Yes, but you can also do this by doing
linker.options.dump_module_path = todo!();
let res = linker.link_to_buffer(inputs, output_type, exported_symbols);
When designing an API, you have to think what's the most common case, and what's the cognitive overhead of leaking more advanced fringe cases to the common API. You have convinced me that passing export_symbols
to link is correct, but dump_module_path
is definitely a fringe case, and shouldn't pollute the API.
Also in your example you've made it &HashSet::new()
which is incorrect - if you don't export any symbols DCE will remove all the code, please fix that.
I agree that maybe it should be improved, but right now this PR is too big, I think it's better to split in other tasks.
I have few other stuff I want to add but after this
This is the central point of this PR, and it's a public API, so we have to get it as right as possible and we should aim to break API as little as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this no longer takes the dump path, but if it's per-operation then it makes sense to me that it would go here rather than having to mutate the linker.
No comments besides what @alessandrod said. The wrappers look good now, thanks for addressing my previous comments. 🙂 There is a huge chance that the CI failures will disappear after you rebase (beta switched to newer LLVM, 1.86.0 looks like a flake). |
6293aaf
to
04b7b63
Compare
applied the requested changes and rebased. let's hope the CI failures will disappear |
The LLVM 19 job seems actually broken. 😢 I can take a look closer to the evening, but my wild guess is that there is some breaking change between LLVM 19 and 20 in one of the wrapped types, which we'll need to handle with feature flags. |
@banditopazzo I pushed a commit with a fix. The problem was that the C strings were not null terminated and apparently LLVM 19 expects the null character. It's a good practice to add it for all FFI calls anyways. Sorry for suggesting a faulty path conversion in my previous comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #295 (comment)
That's the only thing left then it's ready to go
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces in-memory linking capabilities to the BPF linker, enabling linking of both file-based and in-memory object/bitcode data. The implementation includes significant API changes to support reusing linker instances and improved LLVM object management through safer wrappers.
Key changes:
- Added in-memory linking support alongside existing file-based linking
- Redesigned API with
link_to_file
andlink_to_buffer
methods for better instance reuse - Improved LLVM object safety with custom
Drop
implementations and proper lifetime management
Reviewed Changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
src/llvm/types/target_machine.rs |
New wrapper for LLVM target machine with proper resource management and emit methods |
src/llvm/types/module.rs |
New LLVM module wrapper with lifetime safety and bitcode/IR output capabilities |
src/llvm/types/mod.rs |
Module declarations for new LLVM wrapper types |
src/llvm/types/memory_buffer.rs |
New memory buffer wrapper with automatic disposal and slice access |
src/llvm/types/context.rs |
New LLVM context wrapper with module creation and diagnostic handler support |
src/llvm/mod.rs |
Refactored LLVM utilities to work with new wrapper types and removed standalone functions |
src/llvm/di.rs |
Updated debug info sanitizer to use new LLVM wrapper types |
src/linker.rs |
Major API restructure with new input types, linking methods, and improved resource management |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/llvm/types/target_machine.rs
Outdated
let triple = CString::new(triple).unwrap(); | ||
let cpu = CString::new(cpu).unwrap(); | ||
let features = CString::new(features).unwrap(); |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap()
calls will panic if the input strings contain null bytes. Consider using proper error handling or validation to provide clearer error messages.
Copilot uses AI. Check for mistakes.
path: impl AsRef<Path>, | ||
output_type: LLVMCodeGenFileType, | ||
) -> Result<(), String> { | ||
let path = CString::new(path.as_ref().as_os_str().as_encoded_bytes()).unwrap(); |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap()
call will panic if the path contains null bytes. Consider using proper error handling to provide clearer error messages.
let path = CString::new(path.as_ref().as_os_str().as_encoded_bytes()).unwrap(); | |
let path = match CString::new(path.as_ref().as_os_str().as_encoded_bytes()) { | |
Ok(cstr) => cstr, | |
Err(_) => { | |
return Err("Path contains null bytes, which are not allowed.".to_string()); | |
} | |
}; |
Copilot uses AI. Check for mistakes.
} | ||
|
||
pub fn write_bitcode_to_path(&self, path: impl AsRef<Path>) -> Result<(), String> { | ||
let path = CString::new(path.as_ref().as_os_str().as_encoded_bytes()).unwrap(); |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap()
call will panic if the path contains null bytes. Consider using proper error handling to provide clearer error messages.
let path = CString::new(path.as_ref().as_os_str().as_encoded_bytes()).unwrap(); | |
let path = CString::new(path.as_ref().as_os_str().as_encoded_bytes()) | |
.map_err(|e| format!("path contains null byte: {}", e))?; |
Copilot uses AI. Check for mistakes.
src/llvm/types/context.rs
Outdated
} | ||
|
||
pub fn create_module<'ctx>(&'ctx self, name: &str) -> Option<LLVMModule<'ctx>> { | ||
let c_name = CString::new(name).unwrap(); |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap()
call will panic if the name contains null bytes. Consider using proper error handling to provide clearer error messages.
let c_name = CString::new(name).unwrap(); | |
let c_name = match CString::new(name) { | |
Ok(cstr) => cstr, | |
Err(_) => { | |
// Optionally, log or handle the error here | |
return None; | |
} | |
}; |
Copilot uses AI. Check for mistakes.
let path = match input { | ||
InputReader::File { path, .. } => path.into(), | ||
InputReader::Buffer { name, .. } => PathBuf::from(format!("in_memory::{}", name)), |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The format!
macro allocates a new String every time. Consider using a more efficient approach or caching the formatted path since it's used for error reporting.
let path = match input { | |
InputReader::File { path, .. } => path.into(), | |
InputReader::Buffer { name, .. } => PathBuf::from(format!("in_memory::{}", name)), | |
let path: PathBuf = match input { | |
InputReader::File { path, .. } => path.into(), | |
InputReader::Buffer { name, .. } => { | |
// Cache the formatted path string for in-memory buffers | |
PathBuf::from(format!("in_memory::{}", name)) | |
} |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
src/linker.rs
Outdated
llvm::init(&args, "BPF linker"); | ||
} | ||
|
||
let context = LLVMContext::new(); | ||
context.set_diagnostic_handler(&mut diagnostic_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Register diagnostic handler after it reaches a stable address
The context registers &mut diagnostic_handler
inside llvm_init
, but the handler is a stack local that is returned by value and moved into Linker
afterwards. LLVM keeps the raw pointer it was given, so once this function returns that pointer is dangling and any diagnostic callback will read freed stack memory. The handler should be installed after it is moved into the struct (or boxed) so the address remains valid for the lifetime of the context.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use Pin
and Box
for the diagnostic handler, like @tamird did here:
Since you're adding a safe wrapper, I think it makes sense to make it fully safe already.
Given that your PR is older, required a lot of work and is pretty close to being accepted by everyone, I think it's better to get in yours first. Pinning is probably the only critical issue that linters found in @tamird's PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, ok, that PR actually got merged, sorry :/
for mut input in inputs { | ||
let path = match input { | ||
InputReader::File { path, .. } => path.into(), | ||
InputReader::Buffer { name, .. } => PathBuf::from(format!("in_memory::{}", name)), | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Avoid moving input
before it is read
The match input { … }
used to derive path
consumes the InputReader
value, but the code immediately below still calls input.read_exact
and later passes input
to Archive::new
. Because InputReader
contains a File
and is not Copy
, this results in a “use of moved value” compilation error. Borrow the InputReader
when extracting path
(e.g. match &input
) or destructure it by reference so it remains usable for the read.
Useful? React with 👍 / 👎.
@tamird thanks, I will try to solve also some of the codex issues @alessandrod a few more points:
|
I'd say, feel free to ignore Copilot's comments about unwraps, it's out of scope of your change and you're just moving that code around. Passing null bytes to clap is theoretically possible with |
I landed quite a large change in #307 which might be a pain to rebase, sorry about that. On the bright side it removes many unwraps. |
The merge conflict here is unfortunately very large due to all the code movement included in this PR. IMO it would be better to separate functional changes from code movement both to ameliorate this problem and to ease code review. |
Use `CString::new` on all paths passed to libLLVM, to make sure that they are NULL terminated C strings.
cf77679
to
f618ce8
Compare
unsafe { LLVMParseCommandLineOptions(c_ptrs.len() as i32, c_ptrs.as_ptr(), overview.as_ptr()) }; | ||
} | ||
|
||
pub(crate) fn create_module(name: &CStr, context: LLVMContextRef) -> Option<LLVMModuleRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is an example of an API I had changed (to accept CStr) to avoid an unwrap, which is lost in this rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it was very difficult to rebase, I had to constantly look at your changes. I have to double check everything, this was my first round.
And I would like to also find a safer solution for the diagnostic handler. Pin locks the memory address but doesn't ensure it remains alive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be easier to separate the changes as I suggested earlier: either avoid all the code movement in this PR and do it in a later one or do the code movement first in a separate PR and then rebase this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for this time, I went too far with pr. In the beginning I tried to split that in 2 but it was even more because of temporary compatibility, next PRs I promise will be more concise
After thinking a lot on the the safe wrapper for the diagnostic handler I ended with this solution. A bit of context:
so I put the diagnostic handler in an RC shared between the context wrapper and a view object, both of them can be dropped in any order and everything remains consistent. it should be safe, but I would like to hear some of your opinions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine, but I am having a hard time reviewing this because of all the movement. I would really like to see less of the git history obliterated by this change.
@tamird reviewed 1 of 2 files at r18, 1 of 8 files at r26, all commit messages.
@tamird dismissed @alessandrod and @vadorovsky from 8 discussions.
Reviewable status: 2 of 12 files reviewed, 48 unresolved discussions (waiting on @alessandrod, @banditopazzo, and @vadorovsky)
src/linker.rs
line 185 at r28 (raw file):
fn try_from(value: LinkerInput<'a>) -> Result<Self, Self::Error> { match value { LinkerInput::File(file_input) => {
can you destructure here please?
src/linker.rs
line 193 at r28 (raw file):
}) } LinkerInput::Buffer(buffer_input) => Ok(InputReader::Buffer {
destructure
src/linker.rs
line 297 at r28 (raw file):
context: LLVMContext, diagnostic_handler: llvm::InstalledDiagnosticHandler<DiagnosticHandler>, dump_module: Option<PathBuf>,
how come this moved from LinkerOptions?
src/linker.rs
line 302 at r28 (raw file):
impl Linker { /// Create a new linker instance with the given options. pub fn new(options: LinkerOptions) -> Result<Self, LinkerError> {
This function appears to be infallible.
src/linker.rs
line 320 at r28 (raw file):
/// - A "pre-opt.ll" file is written with the IR before optimization. /// - A "post-opt.ll" file is written with the IR after optimization. pub fn set_dump_module_path(&mut self, path: impl AsRef<Path>) {
why is this better than having this in the options?
src/linker.rs
line 354 at r28 (raw file):
/// .collect(); /// /// // Link to a file
comment that repeats function name is just noise
src/linker.rs
line 363 at r28 (raw file):
/// OutputType::Object, /// &export_symbols, /// None,
what's this?
are we not running doctests?
src/linker.rs
line 368 at r28 (raw file):
/// # } /// ``` #[allow(single_use_lifetimes)]
why? if you must, use expect(..., reason = "some compelling reason")
src/linker.rs
line 417 at r28 (raw file):
/// .collect(); /// /// // Link into an in-memory buffer.
ditto
src/linker.rs
line 435 at r28 (raw file):
/// # } /// ``` #[allow(single_use_lifetimes)]
ditto
src/linker.rs
line 453 at r28 (raw file):
/// Link and generate the output code. #[allow(single_use_lifetimes)]
please get all of these
src/linker.rs
line 459 at r28 (raw file):
export_symbols: &HashSet<Cow<'static, str>>, ) -> Result<(LLVMModule<'ctx>, LLVMTargetMachine), LinkerError> { let mut module = link_modules(&self.context, inputs)?;
could you please destructure self? it's a bit difficult to see at a glance which bits of self are used.
&self, | ||
inputs: impl IntoIterator<Item = LinkerInput<'i>>, | ||
output_type: OutputType, | ||
export_symbols: &HashSet<Cow<'static, str>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this no longer takes the dump path, but if it's per-operation then it makes sense to me that it would go here rather than having to mutate the linker.
I ended my first iteration for memory linking and api changes.
To recap my goals:
Major changes:
Drop
in every wrapper, lifetimes, mutability)Open points or missing:
dump_module
should be moved to link methodsI am not surprised if I'm forgetting something.
In any case this should be enough at least to have an idea of changes I am proposing
This change is