What is the need for having a tag bit to have capabilities? #549
-
Hello
I had gone through the cheri-iot doc, and it is mentioned tag bits prevent tampering of capabilities. Can we achieve this even without the tag bits? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
The tag bit is how you tell the difference between a pointer and some other data. The point of the tag bit is to ensure that the rest of the metadata (bounds, permissions, and so on) is trustworthy. It is an attestation from the hardware that says that, if you see a value with the tag bit set, it is a valid pointer that has been derived from some pointer at least equally powerful. Without it, you can just do two data writes and now have a pointer with arbitrary bounds or permissions. In other systems, there are (broadly) two alternatives: You can sign pointers using some cryptographic primitive. For this to be secure, you need to make them a lot bigger. For example, Arm's PAC can use 24 bits for the signature. That sounds like a lot, you have (on average) a 1/2^23 chance of guessing correctly. But you can do your guesses in speculation and use timing side channels, so you trivially forge PAC pointers if you already have arbitrary code execution. Most of the places PAC is used are trying to prevent you from getting arbitrary code execution, but that isn't sufficient as a building block for compartmentalisation. The other alternative is to restrict where capabilities can be stored, storing them in special tables. This doesn't work well for C. Now your C pointer has to be an index into a table. This means temporal safety is hard because you don't have a way of differentiating integers that are indexes into a capability table from integers that are just integers. You can remove access to a specific object, but you prevent use-after-free from the same compartment (and sharing becomes a lot harder). (Aside: There's only one I in CHERIoT) |
Beta Was this translation helpful? Give feedback.
-
Thanks David, I understood the gist. I have a few follow up questions:
How realistic is this attack? Generally direct manipulation on pointer like casting from int to pointer, occur in driver code/low-level systems/assembly code. And the compiler can emit warnings, so that we can sort of inspect those specific functions alone and ensure that pointer metadata is correct and not corrupted. And the remaining places, it is just pointer arithmetic operations, where the overflows can't corrupt a capability, since the metadata is in the upper 32 bits.
|
Beta Was this translation helpful? Give feedback.
-
Thanks a lot David. I understand the model better now. Also, the legacy library support I forgot the fact that CHERIoT is targetted at RISC-V. And yes it makes sense, since RISC-V is a recent architecture supporting purecap only is good tradeoff. Thanks David for taking time and explaining it in detail. -Sai |
Beta Was this translation helpful? Give feedback.
-
On how realistic this attack is: in embedded systems, you often see mixes of pointers and integers, the use of unions, struct overlays and data passed around as As for whether a compiler can warn about this, I guess it can! But only if you control all the code, can afford to inspect every warning and include the compiler in your TCB. Hybrid capability mode makes sense when there’s a legacy base to support. But if your libraries are open source, hybrid support adds complexity and recompiling is probably the better solution from the point of view of security and performance. One extra note on tag bits. They’re stored in a hidden metadata plane, inaccessible to software and cleared by standard memory operations. So memcpy doesn’t just copy the bits. It invalidates the pointer unless you use CHERI-aware instructions. That’s one of the key ways CHERI enforces non-forgeability even during low level memory handling. |
Beta Was this translation helpful? Give feedback.
That depends a lot on the code and the threat model. We want CHERI (in general, and CHERIoT in particular) to be usable for software compartmentalisation. Part of our threat model is that you have arbitrary code provided by a third party and need to enforce memory safety when you run it in a compartment.
Even if you don't want memory safety as a building block for compartmentalisation, it's still quite a common attack vector. You have things like unions of pointers and integers, buffers with imprecise bounds with pointers after them, type-erased things with pointers, and so on. Lots of ways of tricking something into overwriting an integer with a pointer.