Pin all versions (and build docker images)

To make the benchmarks reproducible, all the versions of Python libraries used in the benchmark should be pinned, possibly via a Pipfile.lock. To make re-use even *easier*,  the docker image should be provided as well. (This may require a separate Makefile)