Contains a couple of backup files. Not important. Just so I don't lose my progress with careless vim.
From here the project is seperated into 3 main folders. Each folder represents a different LLVM pass. They are all rather similar in structure so it is best to explain once.
Each has a new_pass.cpp, this contains the main meat of this project. The overall code for the pass.
Each has a CMakeLists.txt, they are the same for all and help CMake.
Each has a build folder.
In the build folder is a construct.sh which when called from inside the build folder will turn a .c file into an executable file with the pass applied and an executable with no pass applied. A .u file. It will also create an instrumented and uninstrumented .ll file. These are in plaintext if you want to read the LLVM code.
To build an example.c file run:
./construct.sh example
In the build folder of whatever pass you want.
There is also a clean.sh which will remove all the junk from creating an executable plus the executable. (This shouldn't be a problem but if a pass is not updating remove everything relating to CMake in the build folder. Passes are done though so you shouldn't need to worry)
Everything not mentioned in this document is just artifacts of building and can be ignored.
This contains a minimal LLVM pass that is mostly generated by ChatGPT plus some finicking by myself to get it working. The pass does a printf whenever a function is exited. This serves as a template and example to get myself acquantied with cpp and LLVM.
The construct.sh, and clean.sh are all my work but original instructions to build and apply pass given by ChatGPT.
Executables are instrumented .e files and uninstrumented .u files
test.c is a super basic c program to test the LLVM pass. Honestly don't recall how much of this was mine or not. It doesn't really matter.
The main folder of this project. Has the CountBounds pass within it.
Executables are instrumented .e files and uninstrumented .u files.
new_pass.cpp is the main code for this project. It is my code less any template code that can be found in simple_pass or marked. Main code is PassInfoMixIn class in the run function and is in the loop going over all instructions. There are 4 if statements in that loop, one for each type of instruction. Also important code are the private functions above that in PassInfoMixIn. Only thing really not that important for understanding is the registration code at the bottom. Everything is lightly commented so it is clear what everything is but to understand the code refer to the writeup.
performance_test.c is the main performance tesing code of this project. It runs a minimal function 10000 times and counts the cycles taken. The cycle counting code is all from ChatGPT and I don't understand it.
performance_suite.sh runs performance_test for uninstrumented code, CountBound instrumented code, and the faux MPX instrumented code. Commands to run and disable context switching is from ChatGPT.
To run:
./performance\_suite.sh
performance_results.txt is my results from running performance_suite.sh.
overflow.c is the main testing module of this project it contains several selectable different examples of minimal cases that are covered (Or specifically not covered) by CountBounds.
min.c is a minimal piece of code that uses all the instructions which are instrumented by CountBounds.
This folder contains the faux MPX pass for performance comparison.
Again new_pass.cpp contains the main code for this. Once again based off the simple_pass template but otherwise mine.
Executables are instrumented .mpx files and uninstrumented .u files.
performance_test.c and overflow.c and min.c are copied from pointer_bounds folder.
Used llvm.org/doxygen for documentation
ChatGPT to build a basic LLVM pass to work off of
ChatGPT for help with errors and for figuring out good functions to use from the LLVM library
ChatGPT for rudimentary examples on how to use llvm pass library. Stuff like "How to make an if statement in llvm pass"
Also asked for ChatGPT help for when I forgot how to do super basic things in C like "How to make a struct"
Aside from the orginal template pass code taken from ChatGPT is marked
Everything below is more for personal notes and is reflected in the writeup, no need to read but it is here
No Memory Overhead
Only a few instructions per load or store. No memory fetching (The longest time).
Most of our instructions are casts, free in any other language.
No underflow protection.
Limited size for bounds.
Uses the free bits in pointers so they cant be used for PAC or MTE.
We check on pointer math, not load store. Stuff can rebecome valid or you might want pointers for other reasons than for where they point.
Totally unsafe casts break everything i.e. ptr to int then int to ptr.
A count of 0x0000 is basically an out of bounds memory touch waiting to happen as everything is at least 1 byte. Therefore we can regard it as an invalid count and use this value as a marker for counts that are over our max count and are therefore not bounds checked at all. As long as we are careful with our pointer math to not let a count get decremented to 0x0000
Max bounds size under this system is 0xFFFF. In decimal, 65536 bytes. Or 16 4KB pages. Go over this and the whole system is ignored.
Could do a larger bounds with lower esolutiob but alot of things can be done with just a couple byte overflow
LLVM already has GEP in bounds instruction which sounds like it protects bounds but it acutally just is an instruction that has the assumption it is only used for inbounds. No protection.
Bounds are only applied if size is within 0xffff therefore all pointer addition of more than 0xffff must be invalid
No casted to non-pointer arithmetic. Only GEP.
Does not work with anything that uses high bits of a pointer.
To have non-countbounds functions be called you would simply remove the count before passing the pointer. For countbounds functions you pass the pointer with the count. I didn't implement an automatic seperation for this because I don't know how to instrument the standard dynamic libraries and it wouldn't be worth figuring out for this POC.
I only setup malloc to work with pointer bounds to show that it can but it is easy to see how you can expand this to the entire set of heap functions.
MPX doesn't always go to memory lookup, somethings are saved on the stack. I'm pretending it always go to memory lookup, paper said something like 50% of the time it goes to memlookup so I think this is fair.
MPX would be alot to implement, I just want a rough approximation of their performance therefore I only do an equivalent amount of loads and stores for each action.
MPX has hardware support. This could have hardware support. I think it is fair to compare MPX and this on a non-hardware support level.
MPX has narrowing of bounds and this does not. Narrowing of bounds could be implemented for this but I didn't want to go through the process of figuring out in what scenarios bounds should be narrowed or not. As such, no narrowing of bounds.
If I were to do narrowing of bounds I'd want a new instruction for it so the prgrammer selects when to narrow.