-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[lldb] Introduce Process::FixAnyAddressPreservingAuthentication #159785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-lldb Author: Felipe de Azevedo Piovezan (felipepiovezan) ChangesThis is yet another variant of the Fix{Code,Data}Address methods, but tailored for pointers that both:
Currently, the callsite inside IRMemoryMap::WritePointerToMemory is an example of 1; the pointer written to memory will be used by JITed code during expression evaluation. An example of (2) can be found in the MTE extension on arm processors. An MTE-tagged pointer must preserve its normal bits in order for load instructions to complete without faulting. However, PAC bits must be stripped, as codegen for some expressions may generate regular load instructions for accesses to those (instead of the special PAC instructions). Full diff: https://github.com/llvm/llvm-project/pull/159785.diff 7 Files Affected:
diff --git a/lldb/include/lldb/Target/ABI.h b/lldb/include/lldb/Target/ABI.h
index 1a1f1724222e3..0839df5ac7cf3 100644
--- a/lldb/include/lldb/Target/ABI.h
+++ b/lldb/include/lldb/Target/ABI.h
@@ -141,6 +141,10 @@ class ABI : public PluginInterface {
return FixDataAddress(pc);
}
+ virtual lldb::addr_t FixAnyAddressPreservingAuthentication(lldb::addr_t pc) {
+ return FixAnyAddress(pc);
+ }
+
llvm::MCRegisterInfo &GetMCRegisterInfo() { return *m_mc_register_info_up; }
virtual void
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index dc75d98acea70..0ad891955cfd1 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -1464,6 +1464,11 @@ class Process : public std::enable_shared_from_this<Process>,
/// platforms where there is a difference (only Arm Thumb at this time).
lldb::addr_t FixAnyAddress(lldb::addr_t pc);
+ /// Strip pointer metadata except for the bits necessary to authenticate a
+ /// memory access. This is useful, for example, if `address` requires
+ /// authentication and it is going to be consumed in JITed code.
+ lldb::addr_t FixAnyAddressPreservingAuthentication(lldb::addr_t address);
+
/// Get the Modification ID of the process.
///
/// \return
diff --git a/lldb/source/Expression/IRMemoryMap.cpp b/lldb/source/Expression/IRMemoryMap.cpp
index f978217fa8f2b..3df2fc072f227 100644
--- a/lldb/source/Expression/IRMemoryMap.cpp
+++ b/lldb/source/Expression/IRMemoryMap.cpp
@@ -647,7 +647,7 @@ void IRMemoryMap::WritePointerToMemory(lldb::addr_t process_address,
if (it == m_allocations.end() ||
it->second.m_policy != AllocationPolicy::eAllocationPolicyHostOnly)
if (auto process_sp = GetProcessWP().lock())
- pointer = process_sp->FixAnyAddress(pointer);
+ pointer = process_sp->FixAnyAddressPreservingAuthentication(pointer);
Scalar scalar(pointer);
diff --git a/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.cpp b/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.cpp
index c595564f6fb8e..700413ed6d26a 100644
--- a/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.cpp
+++ b/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.cpp
@@ -792,6 +792,15 @@ addr_t ABIMacOSX_arm64::FixDataAddress(addr_t addr) {
return DoFixAddr(addr, false /*is_code*/, GetProcessSP());
}
+addr_t ABIMacOSX_arm64::FixAnyAddressPreservingAuthentication(addr_t addr) {
+ // Save the old MTE tag and restore it later.
+ constexpr addr_t mte_mask = 0x0f00000000000000ULL;
+ addr_t old_mte_tag = addr & mte_mask;
+
+ addr_t fixed_addr = FixDataAddress(addr);
+ return old_mte_tag | (fixed_addr & (~mte_mask));
+}
+
void ABIMacOSX_arm64::Initialize() {
PluginManager::RegisterPlugin(GetPluginNameStatic(), pluginDesc,
CreateInstance);
diff --git a/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h b/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h
index c8851709f50ad..b7eb695bdc9c9 100644
--- a/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h
+++ b/lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h
@@ -59,6 +59,7 @@ class ABIMacOSX_arm64 : public ABIAArch64 {
lldb::addr_t FixCodeAddress(lldb::addr_t pc) override;
lldb::addr_t FixDataAddress(lldb::addr_t pc) override;
+ lldb::addr_t FixAnyAddressPreservingAuthentication(lldb::addr_t pc) override;
// Static Functions
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 3176852f0b724..562c8544af72b 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -5971,6 +5971,12 @@ addr_t Process::FixAnyAddress(addr_t addr) {
return addr;
}
+addr_t Process::FixAnyAddressPreservingAuthentication(addr_t addr) {
+ if (ABISP abi_sp = GetABI())
+ addr = abi_sp->FixAnyAddressPreservingAuthentication(addr);
+ return addr;
+}
+
void Process::DidExec() {
Log *log = GetLog(LLDBLog::Process);
LLDB_LOGF(log, "Process::%s()", __FUNCTION__);
diff --git a/lldb/test/API/macosx/arm-pointer-metadata-stripping/TestArmPointerMetadataStripping.py b/lldb/test/API/macosx/arm-pointer-metadata-stripping/TestArmPointerMetadataStripping.py
index f61945b3eb4c9..4e63c3173bdd4 100644
--- a/lldb/test/API/macosx/arm-pointer-metadata-stripping/TestArmPointerMetadataStripping.py
+++ b/lldb/test/API/macosx/arm-pointer-metadata-stripping/TestArmPointerMetadataStripping.py
@@ -38,8 +38,15 @@ def test(self):
symbols_file = self.create_symbols_file()
self.runCmd(f"target module add {symbols_file}")
+ # The address of myglobal_json is: 0x1200AAAAAAAB1014
# The high order bits should be stripped.
- self.expect_expr("get_high_bits(&myglobal_json)", result_value="0")
+ # On Darwin platforms, the lower nibble of the most significant byte is preserved.
+ if platform.system() == "Darwin":
+ expected_value = str(0x200000000000000)
+ else:
+ expected_value = "0"
+
+ self.expect_expr("get_high_bits(&myglobal_json)", result_value=expected_value)
# Mark all bits as used for addresses and ensure bits are no longer stripped.
self.runCmd("settings set target.process.virtual-addressable-bits 64")
|
This is yet another variant of the Fix{Code,Data}Address methods, but
tailored for pointers that both:
1. Are going to be used in-process,
2. Require authentication metadata.
Currently, the callsite inside IRMemoryMap::WritePointerToMemory is an
example of 1; the pointer written to memory will be used by JITed code
during expression evaluation.
An example of (2) can be found in the MTE extension on arm processors.
An MTE-tagged pointer must preserve its normal bits in order for load
instructions to complete without faulting. However, PAC bits must be
stripped, as codegen for some expressions may generate regular load
instructions for accesses to those (instead of the special PAC
instructions).
6d60132 to
cba7da1
Compare
|
I forgot all about #157435 so I just merged it and you'll have conflicts from that. You don't have to make the test work on Linux, In an ideal world, we would know that the target has memory tagging enabled and only then leave the tag in. Though given that MTE implies TBI and we have TBI everywhere we care about, we won't fault leaving in the tag bits on a system without MTE. So what you've got is fine, assuming you have at least TBI on all systems you're gonna use this with. I do wonder if this is gonna work for any AArch64 target, because there can be targets where you need to remove PAC codes but you don't have top byte ignore. Anyway I'll think about that.
Authentication is an awkward word choice considering that it removes Pointer Authentication Codes. Don't have a better suggestion off the top of my head. |
|
A more generic term might be "significant" bits, significant to what is what people will wonder but at least it covers the bits significant to address resolution, plus anything required to use the pointer in code. Which could include the PAC code, if you had really particular code generation that expected to find it. If it's using some (idk if this actually exists) "auth then load" single instruction and has somehow blocked a code of all 0s. |
|
"PreservingAccess" - though not every bit will be about access, "PreservingPermissions" - these aren't really permissions as such, this is not CHERI after all. |
All good, I was just waiting for the PR bots to give me basic signal before tagging reviewers, but you beat me to it! :)
So I'm hoping this PR is a no-op for all other targets, as the only ABI plugin that implements this new method is the Darwin one. All other plugins just forward to the existing FixAnyAddress method, and preserve existing behavior.
Right, I think this is where each plugin will need to some querying of process properties in order to know what the right thing to do is. A very reasonable thing to do here is to query the process on whether it was launched with memory tagging. I've chosen not to do that in the apple plugin, mostly because I think we're missing some debugserver support atm for this kind of query. And because we use TBI everywhere AFAICT. In a very ideal world, I would love if we could not strip anything, ever. But this places some burden on codegen of the expression evaluator, for all languages and all ABIs. I don't think it's doable right now.
Yeah, I'm not happy with the method name either. Let's see if others come up with something, but maybe "Significant" is ok. We could also just say "FixAddressForExpressionEvaluation", given how that's the only use case I know of today. |
|
A little bit of background, I think everyone understands the basic idea of what Felipe is doing here & why, but I think outlining where we are could be helpful. AArch64 processes can be run with ptrauth enabled (the "arm64e" slice, the ABI only officially announced as stable in this year's OS releases, and now usable by third parties). Things like function pointers are signed against a private key in the cpu, things like the link register are signed agaInst a private key and a discriminator of the stack pointer value on function entry. In the function epilogue, $sp is restored to its original value and the link register is validated against the secret key and $sp. The Darwin AArch64 processors run in Top Byte Ignore mode, which means metadata can be stored in the top byte without breaking load/store instructions. Function addresses put PAC signing in the top byte, data access signing leaves the top byte user-controlled. As an aside, we have a pass (not upstreamed) for expressions that will sign things like function pointers in jitted expressions. If you do Apple also introduced Memory Integrity Enforcement in this year's releases, and on the new iPhones one part of MIE is the AArch64 Memory Tag Extension on the new iPhones. When a process opts in to MIE and MTE, 4 bits of the top byte for some heap allocates are now used to store an MTE tag which validates that a pointer cannot access beyond its allocated range. With swift async funclets, called asynchronously, they are passed a pointer to the swift async context block. This is a heap allocated object, and so in process with MIE enabled on an MTE-capable device, it will have an MTE tag. lldb can read and write memory from the swift async context without preserving the MTE tag because its own target-memory accesses aren't authenticated against the tags. But when we pass the swift async context address into a jitted expression -- so the address is being used in code running in the process -- now we must include the MTE tag in the value sent into the expression. (also, the CFA for stack frame is the address of the swift async context. We need to maintain the tag in the CFA, if we're going to use it to calculate local variables that might be used in a jitted expression.) An additional wrinkle is that it seems that in the swift funclet, when built with PAC (arm64e), the local register value pointing to a variable in the swift async context will be PAC signed against a local discriminator value. The user does an expression which uses this variable, but the jitted code isn't aware that the swift async funclet is PAC-enforcing the address of this variable, it just thinks it's using an address of the variable. So we must strip the PAC bits, but it's an address in the heap-allocated MTE swift async context, so we must also retain the MTE tag in the top byte. We considered "well, what if we just leave the PAC bits in there" -- but normal load/store instructions (the ones that do not check PAC signing) require that all non-address bits outside of the Top Byte (with TBI mode enabled) are either all-0's or all-1's (depending on b55, high memory or low memory). Anything else in those bit ranges results in a fault. We can have arbitrary data in the top byte, but the rest of the non-address bits must be correct. If we tried to JIT code that worked with the existing PAC signing, we'd have the problem that this is signed against a discriminator and we'd need to pass that register value & information into the jitted code as well. |
Yes we're getting into a pointers as capabilities sort of model, except they don't always have the same contents like CHERI does and then we're tracking what's essentially provenance to say what's in the unused bits. Which we might have to do one day but I'd prefer to put it off as long as possible.
This is fine with me. It's for code generation and expression evaluation is our only code generation (ignoring the IR interpreter).
I haven't seen anyone deploying MTE or PAC this extensively on Linux, but this all fits with my theoretical understanding of it. So there are times when you know you have to strip the PAC bits, because the loads do not have a preceding auth. Like in your example where you JIT extra code to sign the pointers. You do that because the code that uses them expects to be able to authenticate them. So how do we discriminate between those 2 scenarios? Or do we not. Do the pointers get stripped by the method this PR adds, then the extra code resigns it in-process. |
|
Sorry I got completely side tracked with some other work, but I intend to come back to this next week |
This is yet another variant of the Fix{Code,Data}Address methods, but tailored for pointers that both:
Currently, the callsite inside IRMemoryMap::WritePointerToMemory is an example of 1; the pointer written to memory will be used by JITed code during expression evaluation.
An example of (2) can be found in the MTE extension on arm processors. An MTE-tagged pointer must preserve its normal bits in order for load instructions to complete without faulting. However, PAC bits must be stripped, as codegen for some expressions may generate regular load instructions for accesses to those (instead of the special PAC instructions).