After #3, we depend on cuda-nvcc, which in turn brings in the host compiler (i.e. gcc/gxx on Linux). This not only increases the installation footprint, but is also an overkill since tileiras only needs libnvvm and ptxas. We should find a path to reduce the footprint.