Skip to content
Octave Larose edited this page May 3, 2023 · 18 revisions

Questions will appear here!

28/04: spent much time this week struggling to get it to run on Ubuntu 22.04, both on my local machine and in a Dockerfile. I'm very close though, so expect a PR with a working Dockerfile soon. I should report all three existing ones aren't functional though... Well, the Ubuntu one builds nicely, but the allocscc command in the README complains about missing headers unless ran with allocscc -I/usr/local/src/liballocs/include -I/usr/local/src/liballocs/contrib/libsystrap/contrib/librunt/include -I/usr/local/src/liballocs/contrib/liballocstool/include -o test test.c. Also, ELFTIN must be set in the env else allocscc complains, and that one could fallback to a relative path of ../../contrib/elftin/ if the variable isn't in the env.

  • all sounds good -- good work!

I've also got a few random potentially useful (but dreadfully boring) notes regarding building like it breaking with more recent versions of OCaml and gcc/g++. More annoyingly, have to remove stream_err and __addrmap_max_stack_size from librunt.c to avoid a conflict with liballocs.c which prevents building otherwise, on Ubuntu 22.04 at least. __attribute__((weak)) would probably work and be cleaner, not sure why it's needed in the first place though. Anyway.

  • Hmm, yes. I think __addrmap_max_stack_size should be just in liballocs, but possibly they should each have their own stream_err.

pycallocs code looks nice and hacking-prone since it doesn't seem to be mystical and lacking in comments, so that's great! Can build but not test it yet, since I can't build libcrunch at all and it uses very few features from it. What's this heap_index.h header libcrunch imports but that's nowhere to be found?

  • Whoops. That header is now removed from liballocs but I haven't forward-ported libcrunch yet -- my bad. I was forgetting that pycallocs uses it. Most of the stuff is in allocmeta.h or generic_malloc_index.h. The rationale is that each 'heap allocator' is really a distinct allocator that in principle needs its own header.

As for actual (basic so far) liballocs questions:

  • why is there a need for allocscc? It's quite complex, there's a lot going on with it. AFAIK it exists to leverage CIL/cilly heavily, but I'm not sure you ever explain why that's the case. Instrumentation as described in Documentation/overview-toolchain.txt, I assume?
    • Yes, I'm currently removing it in favour of something simpler. Something of this kind is needed because we do instrument some C code at source level (e.g. alloca) and alter how it is compiled (e.g. -fno-function-sections) and how it is linked (e.g. to include malloc wrappers) and also the dynamic linking set-up (allocsld.so is the dynamic linker). But I have a new approach (see 'toolsub' repository) that is overall simpler, and some of the work can be done in linker plugins. As well as overview-toolchain.txt, 'malloc-indexing.txt' is worth a look (be prepared for horror).
  • how is this instrumentation done in practice, in detail? I assume that's how libsystrap comes into play.
    • Yes although that is the simplest kind of instrumentation, just for system calls. I think this means I should work on the documentation so that it itemises what kinds of instrumentation are done. Basically all syscalls that 'might' be interesting are clobbered into ud2 instructions and run in a SIGILL handler. This is indeed what libsystrap handles. One of my big to-do items is to be more selective about this syscall instrumentation and also cache the instrumentation points in the filesystem, since there is a noticeable startup delay from scanning all instructions as they are loaded. I used to avoid this with mega-hacks (knowing roughly where in libc was important to trap) but that has been invalidated by time, so currently it does the expensive thing.
  • curious about fake-libunwind and why it's needed
    • Using 'real' libunwind (which uses DWARF frame information) used to be slower than -fno-omit-frame-pointer and the fake libunwind (which was just a really simple/fast stack walker using frame pointers). There probably still is some time penalty, but I am worrying about it less for the moment. I want to integrate the 'compiled frame information' work from OOPSLA 2019, which will definitely eliminate the problem, but that is not done yet. Basically it would mean generating a further kind of metadata, 'compiled' from the DWARF frame information. Though a modified libunwind is still needed in that case. Probably this should be vendored locally and linked privately into liballocs.

03/05: More build related questions, since I'm running out of time and I don't quite have everything running like I want it to, even now :(

For running with the latest liballocs version (within the latest Docker container): I can build it, as described previously, but no dice for libcrunch. A make fails at the end, spending ages and much CPU effort on allocsld: jumping to system ld.so entry point 0x555555557090 with rsp 0x7fff73e964b0 before segfaulting. See trace

LD_PRELOAD="/home/user/libcrunch/src/libcrunch_preload.so" ./hello
Hello from allocsld!
We think we are not the program
AT_PHDR is 0x5615b2a00040
AT_PHENT is 0x38
AT_PHNUM is 0x7
AT_BASE is 0x555555556000
AT_ENTRY is 0x5615b2a00db0
AT_EXECFN is 0x7fff73e97ff0 (./hello)
allocsld: jumping to system ld.so entry point 0x555555557090 with rsp 0x7fff73e964b0
Segmentation fault (core dumped)

Only diff with master is

+//#include "heap_index.h"
+#include "generic_malloc_index.h" 

as discussed, and I suppose this in the frontend/c Makefile if you care

-OCAMLFLAGS += -I $(dir $(THIS_MAKEFILE))/lib -I $(LIBALLOCS_BASE)/tools/lang/c/cilallocs #-I $(dir $(wildcard $(shell which $(CILLY))/../lib/ocaml/*/cil))
+OCAMLFLAGS += -I $(dir $(THIS_MAKEFILE))/lib -I $(LIBALLOCS_BASE)/tools/lang/c/cilallocs -I /usr/local/src/liballocs/contrib/cil/lib/cil/

Can't run pycallocs without a working libcrunch_preload.so, I'm afraid..

For running with an older liballocs version (90b819a6b04be18977772294c502fbcbc3a1675a):

Tried building liballocs with a version from 2019 which I assumed may work better. Here's a Dockerfile patch for it (the sed shenanigans are because git config --global url."https://".insteadOf git:// never worked for some reason)

diff --git a/buildtest/ubuntu-18.04/Dockerfile b/buildtest/ubuntu-18.04/Dockerfile
index b1d5499..2bcb394 100644
--- a/buildtest/ubuntu-18.04/Dockerfile
+++ b/buildtest/ubuntu-18.04/Dockerfile
@@ -1,7 +1,7 @@
 FROM ubuntu:18.04
 
 ARG user
-RUN apt-get update && apt-get install -y sudo
+RUN apt-get update && apt update && apt-get install -y sudo
 RUN adduser ${user:-user} && \
     echo "${user:-user} ALL=(root) NOPASSWD:ALL" > /etc/sudoers && \
     chmod 0440 /etc/sudoers
@@ -26,6 +26,11 @@ RUN sudo apt-get install -y libelf-dev libdw-dev \
        libunwind-dev libc6-dev-i386 zlib1g-dev libc6-dbg \
        libboost-iostreams-dev libboost-regex-dev libboost-serialization-dev libboost-filesystem-dev
 RUN cd /usr/local/src && git clone https://github.com/stephenrkell/liballocs.git
+RUN cd /usr/local/src/liballocs && git checkout 90b819a6b04be18977772294c502fbcbc3a1675a
+RUN cd /usr/local/src/liballocs &&  \
+    sed -i 's/git:\/\//https:\/\//g' .gitmodules && \
+    git submodule update --init && \
+    find . -name ".gitmodules" -type f -exec sed -i 's/git:\/\//https:\/\//g' {} \;
 RUN cd /usr/local/src/liballocs && \
    git submodule update --init --recursive && \
    make -C contrib -j4

...however, this command (also in the Dockerfile) fails: make -f tools/Makefile.meta /usr/lib/meta/lib/x86_64-linux-gnu/libc-2.27.so-meta.c. Quite a bit going on in that Makefile and related scripts, so not confident I can debug it efficiently. Far as I can tell, it could all be caused by the file /lib/x86_64-linux-gnu/.debug/libc-2.27.so not being properly created. Anyway, if you have any idea why it'd not be functional, that'd be great. Alternatively, if you have a docker image laying around with a working libcrunch, that could work too.

As for actual liballocs questions:

  • pageindex.c is for the allocator tree, I assume? It's hard to parse since it's quite long, but the term "bigalloc/bigallocation" stands out to me. What is that referring to in this context?
Clone this wiki locally