Skip to content

PE: Add support for ARM64 unwind info#517

Open
kkent030315 wants to merge 3 commits intom4b:masterfrom
kkent030315:arm64unwind
Open

PE: Add support for ARM64 unwind info#517
kkent030315 wants to merge 3 commits intom4b:masterfrom
kkent030315:arm64unwind

Conversation

@kkent030315
Copy link
Contributor

addresses #512

This is draft for now until plenty of more unwind code covered by tests (and of course for malformed ones as well).

@m4b
Copy link
Owner

m4b commented Jan 31, 2026

Yesss! I’ll try to give this a review this weekend, thanks @kkent030315 :)

@kkent030315
Copy link
Contributor Author

Thanks, this patch would need some generalization over ARM and ARM64 (e.g., the scale of function length is 2 and xdata header is 1-2 bytes in ARM) so we can add support for ARM as well?

I don't think we should support 32-bit ARM (i.e, Windows CE binaries) though...

Support for 32-bit Arm versions of applications is removed in a future release of Windows 11.1

Footnotes

  1. https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, is there any blocking issue you can think of/why we shouldn't merge?

}

/// Returns an iterator over ARM64 runtime function entries.
pub fn functions_arm64(&self) -> Arm64RuntimeFunctionIterator<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be slightly more idiomatic to write iter_functions_arm64 ?

I would be tempted to rename this to arm64_functions ? though the downside is it starts with less interesting information (arm64() as opposed to (functions) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, overall, we would do something like made up of wrapper enum, but it's obviously breaking change. arm64_functions is acceptable, however as you mentioned, it's less informative for developers, specifically when type func... in their IDE.

pub enum RuntimeFunction {
  X86(X86RuntimeFunction),
  Arm(ArmRuntimeFunction),
}

&self,
function: Arm64RuntimeFunction,
sections: &[section_table::SectionTable],
) -> Option<error::Result<Arm64UnwindInfo<'a>>> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth checking what other parts of goblin return in cases like this, Option<Result> or Result<Option> is probably better? Also, it might even be worth considering if the error is actionable or useful at all (e.g., effectively whether it's an OOB), to just ok() it into an Option and drop any error message info (again if it's not useful or actionable for the user)

I say this believing that more information the better, but if it's just satisfying the type system, might be better to coalesce the return type into a single one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I agree it's a bit unfortunate to introduce Option<Result<T>. The reason for that is, for binary parsing, "absent of the data" v. "data present, but parsing fail" can be informative. For example, if I want to make a binary visualizer, I don't really want to display the data as "absent" when it might be an error parsing the data.

This might only cost a bit for developers while being informative, so I'd rather keep it, since it's public API we can't make changes later (meaning I'd want to keep the "absent" v. "error" meta regardless of what errors are in).

Especially when it comes to unwind info, it's often more of more important for the developer to know whether the info is absent or parsing fails!

foo.get_unwind_info_arm64(...) // Option<Result<T>>
	.map(Result::ok) // Option<Option<T>>
	.flatten() // Option<T>

foo.get_unwind_info_arm64(...)
	.and_then(|r| r.ok())

(1u32 << bits) - 1
}

const fn bits(v: u32, shr: u32, bits: u32) -> u32 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the const part guarantees this is inlined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it has no such guarantee. We can introduce #[inline(always)].

Comment on lines +1331 to +1334
self.bytes
.gread_with::<u32>(&mut self.offset, scroll::LE)
.map(Arm64EpilogScope)
.map_err(Into::into),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice; this pattern comes up so much it makes me want to write a ScrollIterator<T> that does all this for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It... might, might be made better, by just doing bytes.chunks_exact(size_of::<Foo>()).map(...) so we can fold redundant Iterator impls. Here is the siginifcant example from my own mutable-aware PE/ELF parser that every parser is dereffered and has only single slice reference (so-called View auto datastructures).

impl<'buf, T> PortableExecutable<'buf, T>
where
    T: BufRef,
{
    /// Returns an iterator over sections.
    pub fn sections(
        &self,
    ) -> error::Result<impl Iterator<Item = ImageSectionHeaderView<'_, &[u8]>> + '_> {
        // ...

        Ok(buf
            .chunks_exact(size_of::<ImageSectionHeader>())
            .map(ImageSectionHeaderView::new))
    }
}
impl<'buf, T> PortableExecutable<'buf, T>
where
    T: BufMut,
{
    /// Returns a mutable iterator over sections.
    pub fn sections_mut(
        &mut self,
    ) -> error::Result<impl Iterator<Item = ImageSectionHeaderView<'_, &mut [u8]>> + '_> {
        // ...

        Ok(buf
            .chunks_exact_mut(size_of::<ImageSectionHeader>())
            .map(ImageSectionHeaderView::new))
    }
}

Comment on lines +1364 to +1366
epilog_scope_bytes: &'a [u8],
/// Bytes of unwind codes.
unwind_code_bytes: &'a [u8],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like to put non-pub fields after pub ones but ymmv

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct Arm64UnwindExtension(pub u32);

const _: () = assert!(size_of::<Arm64UnwindExtension>() == 4);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat! I wanted this so badly back in early days of rust, in this library and in dryad specifically :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants