-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[lldb] Call FixUpPointer in WritePointerToMemory (try 2) #153585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb] Call FixUpPointer in WritePointerToMemory (try 2) #153585
Conversation
In architectures where pointers may contain metadata, such as arm64e, the metadata may need to be cleaned prior to sending this pointer to be used in expression evaluation generated code. This patch is a step towards allowing consumers of pointers to decide whether they want to keep or remove metadata, as opposed to discarding metadata at the moment pointers are created. See llvm#150537. This was tested running the LLDB test suite on arm64e. (The first attempt at this patch caused a failure in TestScriptedProcessEmptyMemoryRegion.py. This test exercises a case where IRMemoryMap uses host memory in its allocations; pointers to such allocations should not be fixed, which is what the original patch failed to account for).
@llvm/pr-subscribers-lldb Author: Felipe de Azevedo Piovezan (felipepiovezan) ChangesIn architectures where pointers may contain metadata, such as arm64e, the metadata may need to be cleaned prior to sending this pointer to be used in expression evaluation generated code. This patch is a step towards allowing consumers of pointers to decide whether they want to keep or remove metadata, as opposed to discarding metadata at the moment pointers are created. See #150537. This was tested running the LLDB test suite on arm64e. (The first attempt at this patch caused a failure in TestScriptedProcessEmptyMemoryRegion.py. This test exercises a case where IRMemoryMap uses host memory in its allocations; pointers to such allocations should not be fixed, which is what the original patch failed to account for). Full diff: https://github.com/llvm/llvm-project/pull/153585.diff 1 Files Affected:
diff --git a/lldb/source/Expression/IRMemoryMap.cpp b/lldb/source/Expression/IRMemoryMap.cpp
index 150699352a2e3..3ac42649d0834 100644
--- a/lldb/source/Expression/IRMemoryMap.cpp
+++ b/lldb/source/Expression/IRMemoryMap.cpp
@@ -640,6 +640,13 @@ void IRMemoryMap::WritePointerToMemory(lldb::addr_t process_address,
lldb::addr_t address, Status &error) {
error.Clear();
+ /// Only ask the Process to fix the address if this address belongs to the
+ /// process (host allocations are stored in m_data).
+ if (auto it = FindAllocation(process_address, 1);
+ it != m_allocations.end() && it->second.m_data.GetByteSize() == 0)
+ if (auto process_sp = GetProcessWP().lock())
+ address = process_sp->FixAnyAddress(address);
+
Scalar scalar(address);
WriteScalarToMemory(process_address, scalar, GetAddressByteSize(), error);
|
Thanks @jasonmolenda for suggesting a solution to the test issues |
if (auto it = FindAllocation(process_address, 1); | ||
it != m_allocations.end() && it->second.m_data.GetByteSize() == 0) | ||
if (auto process_sp = GetProcessWP().lock()) | ||
address = process_sp->FixAnyAddress(address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very "modern C++" but very much confusing to read. I think splitting the first if condition up is better. Or add braces to the first if, whichever you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could we have an overlap between host and debugee memory addresses?
I suspect the answer is "yes but not in the scenario I'm fixing". As if we were going to write anything to the debugee, there would be no host allocatons in m_data.
Add a comment to state that if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could we have an overlap between host and debugee memory addresses?
Probably, in the sense that these are all just numbers. But I'm not sure I follow the concern here: IRMemoryMap is supposed to track which address comes from where, so it's not clear why overlapping is a concern? It is the job of IRMemoryMap to think about overlaps when it is creating Allocation
s, not during WritePointerToMemory
.
Address review comments |
Does this change anything with regard to #134247 ? I'm not sure. I don't think so because in that case the function pointer is probably just read from memory instead of being written to memory again. Which reminds me, this should have a testcase, even if it's very artificial. It'll help explain how to hit this code and remind us later if we change how this is handled.
Along those lines. If you use the top byte you don't have to care about virtual address size or amount of pointer authentication bits used. |
Can you elaborate on what that program is showing and how this PR would affect that?
The test |
/// process. An address belongs to the process if the Allocation contains a | ||
/// non-empty m_data member. | ||
if (auto it = FindAllocation(process_address, 1); | ||
it != m_allocations.end() && it->second.m_data.GetByteSize() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at IRMemoryMap::WriteMemory()
(I haven't read these methods in years, needed to refresh my memory), it has different behavior based on the Allocation
m_policy
which can be one of eAllocationPolicyHostOnly
, eAllocationPolicyMirror
, or eAllocationPolicyProcessOnly
. The goal is to run any pointer that is written into actual process memory through Fix, because it will be used by a jitted expression running in native code, and cannot refer to the host-side-only fake addresses that IRMemoryMap may hand out. Maybe the m_data.GetByteSize()
test is sufficient, but I think it's maybe clearer to test for m_policy != eAllocationPolicyHostOnly
? I'm not sure what Mirror
is used for, possibly expression results where we might want to take the address of them in inferior-memory, but for efficiency of display/use in lldb, also stored in lldb.
That function, when called with a pointer to a pointer written to memory with WritePointerToMemory, would check whether that pointer had retained it's non-address bits. Without this change, it would see the non-address bits. With this change it would not. (which is unrelated to the host/debugee problem the reland is also fixing) I'm not sure how to craft an expression that will hit WritePointerToMemory though. Maybe you could use API calls to trigger WritePointerToMemory, then run an expression using the address it was written to. In other words:
By doing the check using code in the debugee, you make sure that no other fix address call can get in the way. And if you change the strategy later, the test will break and remind us to change this particular call to fix address. |
I'm out next week, but I'll have a look at that idea when I'm back! |
@DavidSpickett in a different comment, you mentioned that
Do you know how to write a test targeting that architecture specifically? One of the main challenges I have right now is that this seems almost impossible to test. I can't write an expression whose variables will have metadata in their pointers. |
You can manually set the top byte to some value, then in the expression call a function that checks for those bits. This does not require special code just a bunch of uintptr_t casts. You can do this on any AArch64 Linux, Top Byte Ignore has been enabled in userspace since the beginning.
(including MacOS I think) The tricky bit will be getting the expression to trigger writing the pointer to memory. I'd start with something like:
Where We're hoping that the In which case, you could look for a way to trigger the write via. the API and then read it back? In fact if you can do that, it's easier overall. Have the application tag the pointer, read it into lldb, write it back to some new location and read that back. If it comes back untagged, this patch didn't work. Whether you can hit this from the API, idk. I'll have a quick look now. |
So that's the challenge, what the function itself does is irrelevant. In this PR, the code we're writing only affects this:
In this case, |
Ok nothing in SBAPI directly. Only IRMemoryMap calls this. If it only calls it when writing the address of a symbol, problem is that won't be tagged unless the ABI decided it was, certainly the debug info won't include tag bits. How did you hit this problem in the first place? Some failure on arm64e I presume? |
Ok so this is true then:
So yeah, how did you hit this problem in the first place? Test failure on arm64e? I'd be ok with a test that only runs there if it is the only way to trigger this. Could we do something like loading new symbol information using https://lldb.llvm.org/use/symbolfilejson.html that includes a tagged address? Idk if we can overwrite information that way though. We could somehow read back the address we want, load the binary without debug info then make a json file with just that address, tagged 🤣 ...which is a bit much. |
Yeah, once I remove the calls in #150537, some pointers from expression evaluation were no longer being stripped of their arm64e signatures. Let me re-run the test suite without this patch on an arm64e target and see if I can draw inspiration from those failing tests. Will report back in a bit. I don't think there are any upstream targets where this test would be able to run, though I don't mind merging a test guarded by the mac&arm64e triples. |
....did not mean to click the update button, sorry! I will try the JSON symbol file route. I think we can make a new tagged address symbol overlapping an existing one. |
I'm sure you remember seeing it, but I created one of these for a test case in #153922 lldb/test/API/functionalities/unwind/cortex-m-exception/binary.json |
There is also I got this to work:
(note that the section address must also be tagged otherwise it won't match the symbol to it) We can get all this info from the running program to build the file during the test.
I'm passing the address here otherwise we have no way to know the bits were or were not stripped. I think that's not enough though right? We need some Anyway you see the idea here. Tell me if that could be made to work. |
This seems promising!
How did you get this to work? As soon as I do this, I lose all the other symbols in my original program, including the function I want to call ( |
Could it be because I run to main and then load the other symbols? Also I forgot to mention why I added the cast:
Something clearly happens to the previous symbols. Maybe they are removed and we know enough to say that |
It still knows where the global variable is:
But again this could be because that's known from something other than the DWARF. |
Sorry for the delay, long weekend here in the US. Ok, I think I finally got it working, mostly? I was facing two issues: 1) So I have this:
And this main.c:
The contents of main are irrelevant here, more of a sanity check (the second assert should fail on an arm64e target). So we load the json file, add the new symbol (I opted for a new symbol instead of overwriting the existing one), and run it through the expression evaluator.
WITH this patch:
This looks like what we expect, right? If so, I will go ahead and try to make this into an API test! |
Yes that's right. I'm still a bit unclear how it makes its way to WritePointerToMemory, but as long as it'll break if we change things, that's the point. I think it is that:
Yes, because the wrapper itself has no arguments. So this isn't like a normal compilation where you put the pointer in a register, everything has to be on the stack first. Also stylistic point, use A new variable name is a good idea too, if you check both in the test, if it does fail, the difference between the 2 will be a big clue to figuring it out. |
Also is there a store reference to memory and if so does it do its own thing or call store pointer? |
Your guess is right. Enabling
|
could you elaborate? Not sure what you meant by "store reference to memory". Maybe the question is answered by the IR above? |
fd053ba
to
d47d70c
Compare
d47d70c
to
e96470a
Compare
Added a test! |
"sections": [ | ||
{ | ||
"name": "__DATA", | ||
"type": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to make this test run only on apple platforms?
✅ With the latest revision this PR passed the C/C++ code formatter. |
In architectures where pointers may contain metadata, such as arm64e, the metadata may need to be cleaned prior to sending this pointer to be used in expression evaluation generated code.
This patch is a step towards allowing consumers of pointers to decide whether they want to keep or remove metadata, as opposed to discarding metadata at the moment pointers are created. See #150537.
This was tested running the LLDB test suite on arm64e.
(The first attempt at this patch caused a failure in TestScriptedProcessEmptyMemoryRegion.py. This test exercises a case where IRMemoryMap uses host memory in its allocations; pointers to such allocations should not be fixed, which is what the original patch failed to account for).