Skip to content

Conversation

@BretasArthur1
Copy link

@BretasArthur1 BretasArthur1 commented Nov 20, 2025

Context

Recently I was testing generating IR with different languages and using LLVM to generate the bitcode, then using sbpf-linker to generate a .so file

Problem

In the process we need this extra step to generate the .bc with LLVM. I started a thread on X about adding .ll files as parameters for the sbpf-linker and then and alessandro mention the possibility of adding it for bpf-linker and here we are!


This change is Reviewable

@alessandrod alessandrod self-requested a review November 20, 2025 10:41
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

@tamird reviewed 4 of 5 files at r1, all commit messages.
Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod)


-- commits line 2 at r1:
it would be great to include your motivation here


src/llvm/mod.rs line 144 at r1 (raw file):

    linked
}
#[must_use]

newline between functions please


src/llvm/mod.rs line 148 at r1 (raw file):

    context: &'ctx LLVMContext,
    module: &mut LLVMModule<'ctx>,
    buffer: &CStr,

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]


src/llvm/mod.rs line 176 at r1 (raw file):

        linked = unsafe { LLVMLinkModules2(module.as_mut_ptr(), temp_module) } == 0;
    } else {
        if !error_msg.is_null() {

we should return the error message.


src/llvm/mod.rs line 179 at r1 (raw file):

            unsafe { LLVMDisposeMessage(error_msg) };
        }
        if !temp_module.is_null() {

this should be impossible, no?


Cargo.toml line 43 at r1 (raw file):

rustc-build-sysroot = { workspace = true }
which = { version = "8.0.0", default-features = false, features = ["real-sys", "regex"] }
tempfile = "3.13"

could you keep this alphabetical please and use { version = ... } for consistency?


src/linker.rs line 514 at r1 (raw file):

    // buffer used to perform file type detection
    let mut buf = [0u8; 1024];

could you add a comment here, or maybe make this more general so that this arbitrary size isn't needed? e.g. i wonder if BufRead would help?


src/linker.rs line 600 at r1 (raw file):

        }
        InputType::Ir => {
            data.push(0); // force push null terminator

I think you can just use CString::new(data) which will internally append the nul byte.


src/linker.rs line 914 at r1 (raw file):

        Some(position) => &data[position..],
        None => return false,
    };

data.trim_ascii_start?

Code quote:

    // Trim whitespace from the start of the data
    let trimmed = match data.iter().position(|b| !b.is_ascii_whitespace()) {
        Some(position) => &data[position..],
        None => return false,
    };

src/linker.rs line 923 at r1 (raw file):

        || trimmed.starts_with(b"target ")
        || trimmed.starts_with(b"define")
        || trimmed.starts_with(b"!llvm")

might be easier to read as [.....].iter().any(|prefix| trimmed.starts_with(prefix))

Code quote:

    trimmed.starts_with(b"; ModuleID")
        || trimmed.starts_with(b"target triple")
        || trimmed.starts_with(b"target datalayout")
        || trimmed.starts_with(b"source_filename")
        || trimmed.starts_with(b"target ")
        || trimmed.starts_with(b"define")
        || trimmed.starts_with(b"!llvm")

@alessandrod
Copy link
Collaborator

alessandrod commented Nov 20, 2025

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]

because deep inside parse IR, LLVM hardcodes a RequiresTerminator=true, so you get a debug assertion if you set this to false and don't null terminate

image

@tamird
Copy link
Member

tamird commented Nov 20, 2025

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]

because deep inside parse IR, LLVM hardcodes a RequiresTerminator=true, so you get a debug assertion if you set this to false and don't null terminate

image

Sounds like a great comment to leave.

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 514 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

could you add a comment here, or maybe make this more general so that this arbitrary size isn't needed? e.g. i wonder if BufRead would help?

What you suggest as a comment for this one?

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 514 at r1 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

What you suggest as a comment for this one?

I would avoid the arbitrary-size buffer if possible - in the case of IR you could have unbounded whitespace before the thing you look for in is_llvm_ir.

@BretasArthur1 BretasArthur1 requested a review from tamird November 21, 2025 17:37
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 1 of 4 files at r2, all commit messages.
Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 915 at r2 (raw file):

fn is_llvm_ir(data: &[u8]) -> bool {
    let trimmed = data.trim_ascii_start();
    if trimmed.is_empty() {

this check is not needed, right?


src/llvm/mod.rs line 152 at r2 (raw file):

    let buffer_name = c"ir_buffer";
    let buffer = buffer.to_bytes();
    let mem_buffer = unsafe {

don't you need to unsafe { LLVMDisposeMemoryBuffer(buffer) }; like the function above?


src/llvm/mod.rs line 157 at r2 (raw file):

            buffer.len(),
            buffer_name.as_ptr(),
            1, // LLVM internally sets RequiresTerminator=true

I am now confused by this. Can we just set it to 0 and then we wouldn't need to create a C-string?

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 915 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this check is not needed, right?

Now that we switched to the approach of no hardcoded buffer I need to remove this, sorry


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

don't you need to unsafe { LLVMDisposeMemoryBuffer(buffer) }; like the function above?

On some tests I was doing this but, if we drop the buffer here it can't hold the internal reference to it and perform the checks, but this was with the previous approach, I need to check now!


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I am now confused by this. Can we just set it to 0 and then we wouldn't need to create a C-string?

So because of the llvm ir parser we need to set this to one because of that hardcoded value setting null termination to true even if the buffer don't have it... This was a issue alessandro found it with llvm in debug mode, but maybe he can answer this better
cc @alessandrod

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

So because of the llvm ir parser we need to set this to one because of that hardcoded value setting null termination to true even if the buffer don't have it... This was a issue alessandro found it with llvm in debug mode, but maybe he can answer this better
cc @alessandrod

Now that I am looking at the screenshot he posted, I think that was just because of this 1. If you change it to 0 I think you do not need a null terminator.

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @tamird)


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Now that I am looking at the screenshot he posted, I think that was just because of this 1. If you change it to 0 I think you do not need a null terminator.

That print was testing with 0

@BretasArthur1
Copy link
Author

Done, removed the unnecessary check and also tested with unsafe { LLVMDisposeMemoryBuffer(buffer) }; and the tests panic. cc @tamird

@BretasArthur1 BretasArthur1 requested a review from tamird November 21, 2025 22:32
@BretasArthur1
Copy link
Author

hey @tamird gm! any blockers here ?

@alessandrod
Copy link
Collaborator

@tamird wrt the memory buffer thing, I think we actually need to change all the other instances to be null terminated

The RequiresNullTerminator field isn't stored anywhere, it's only used to do a check when the buffer is created https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/Support/MemoryBuffer.cpp#L48

Then this is why we get the assertion in the screenshot:

LLVMParseIRInContext => parseIR => parseAssembly => parseAssemblyInto

https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/AsmParser/Parser.cpp#L30

std::unique_ptr<MemoryBuffer> Buf = MemoryBuffer::getMemBuffer(F);

https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/include/llvm/Support/MemoryBuffer.h#L138

so RequiresNullTerminator=true and we hit the assertion if we don't null terminate our buffer

I think in the other instances we're lucky and we're not hitting the assertion by accident

src/linker.rs Outdated
let mut buf = BufReader::new(input);

// Peek at the buffer to determine file type
let preview = buf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still an arbitrary size buffer (the size of the internal BufReader
buffer which is 4096 by default)

You don't need BufReader here. What you want to do is pass input to
detect_input_type instead of passing &[u8]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, just saw on the implementation

"BufReader can improve the speed of programs that make small and repeated read calls to the same file or network socket. It does not help when reading very large amounts at once, or reading just one or a few times. It also provides no advantage when reading from a source that is already in memory, like a Vec"

) -> Result<bool, String> {
let buffer_name = c"ir_buffer";
let buffer = buffer.to_bytes();
let mem_buffer = unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're leaking this you need to call LLVMDisposeMemoryBuffer before returning


// Corrupting IR content
let invalid_content = valid_content
.replace("define", "defXne")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to corrupt 3 things, pick one :D

…ter and fix memory leak. remove unnecessary replacements on test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants