Skip to content

Splitting this package in managable chunksΒ #108

@hmaarrfk

Description

@hmaarrfk

Comment:

This package currently requires more than 16 builds to be build manually to ensure that it completes in time on the CIs.

Step 1: No more git clone

rgommers identified that one portion of the build process that takes time is cloning the repository. In my experience, cloning the 1.5GB repo can take up to 10 min on my powerful local machine, but I feel like it can take much longer on the CIs.

To avoid cloning, we will have to list out all the submodule manually, or make the conda-forge installable dependencies.

I mostly got this working using a recursive script which should help us keep it maintained: #109

Option 1: Split off Dependencies:

Dependency linux mac win GPU Aware PR system deps
pybind11 no https://github.com/conda-forge/pybind11-feedstock USE_SYSTEM_PYBIND11
cub no https://github.com/conda-forge/cub-feedstock
eigen no https://github.com/conda-forge/eigen-feedstock USE_SYSTEM_EIGEN_INSTALL
googletest no will not package
benchmark no https://github.com/conda-forge/benchmark-feedstock
protobuf no https://github.com/conda-forge/libprotobuf-feedstock
ios-cmake not needed since we don't target ios
NNPACK yes yes no conda-forge/staged-recipes#19103
gloo yes yes yes conda-forge/staged-recipes#19103 USE_SYSTEM_GLOO
pthreadpool yes yes no conda-forge/staged-recipes#19103 USE_SYSTEM_PTHREADPOOL
FXdiv yes yes header conda-forge/staged-recipes#19103 USE_SYSTEM_FXDIV
FP16 yes yes header conda-forge/staged-recipes#19103 USE_SYSTEM_FP16
psimd yes yes header conda-forge/staged-recipes#19103 USE_SYSTEM_PSIMD
zstd yes yes yes no https://github.com/conda-forge/zstd-feedstock
cpuinfo yes yes no no conda-forge/staged-recipes#19103 USE_SYSTEM_CPUINFO
python-enum no https://github.com/conda-forge/enum34-feedstock
python-peachpy yes yes yes no conda-forge/staged-recipes#19103
python-six yes yes yes no https://github.com/conda-forge/six-feedstock
onnx no https://github.com/conda-forge/onnx-feedstock USE_SYSTEM_ONNX
onnx-tensorrt only
sleef no https://github.com/conda-forge/sleef-feedstock USE_SYSTEM_SLEEF
ideep
oneapisrc
nccl https://github.com/conda-forge/nccl-feedstock
gemmlowp
QNNPACK yes yes conda-forge/staged-recipes#19103
neon2sse
fbgemm yes
foxi
tbb https://github.com/conda-forge/tbb-feedstock USE_SYSTEM_TBB (deprecated)
fbjni
XNNPACK yes yes conda-forge/staged-recipes#19103 USE_SYSTEM_XNNPACK
fmt https://github.com/conda-forge/fmt-feedstock
tensorpipe yes
cudnn_frontend
kineto
pocketfft
breakpad
flatbuffers yes yes yes no https://github.com/conda-forge/flatbuffers-feedstock
clog static static conda-forge/staged-recipes#19103
  • clog seems to be a pretty low level library that is assisted by compile time flags. I think it is best if we don't package that one as a library. It seems like it will require some serious consideration in terms of performance if we do. They typically the full source in the repository. The only problematic thing, is that each package attempts to install the static library into the library path.
  • QNNPACK has a build option to allow a special provision for CAFFE2's implementation of pthreadpool
    • It seems to be problematic with pthreadpool on OSX.
  • QNNPACK likely has two different implementations, the one they vendored in ATen, and the one they vendored in third_party.
  • NNPACK has two different backens, one generated by python it seems, but for some reason fp16.py cannot be found, the other with psimd.

Option 2 - step 1: Build a libpytorch package or something

By setting BUILD_PYTHON=OFF in #112 we then end up with the following libraries in lib and include:

Dependency linux mac win GPU Aware PR
libasmjit yes yes conda-forge/staged-recipes#19103
libc10 yes yes conda-forge/staged-recipes#19103
libfbgemm yes yes yes conda-forge/staged-recipes#19103
libgloo yes yes yes
libkineto yes yes conda-forge/staged-recipes#19103
libnnpack yes ??? conda-forge/staged-recipes#19103
libpytorch_qnnpack yes yes conda-forge/staged-recipes#19103
libqnnpack yes yes conda-forge/staged-recipes#19103
libtensorpipe yes
libtorch
libtorch_cpu
libtorch_global_deps
Header only
ATen
c10d
caffe2
libnop yes yes conda-forge/staged-recipes#19103

Option 2 - step 2: Depend on new ATen/libpytorch package

Compilation time progress

platform python cuda main tar gh-109 system deps
linux 64 3.7 no 1h57m 1h54m
linux 64 3.8 no 2h0m 1h51m
linux 64 3.9 no 2h31m 2h2m
linux 64 3.10 no 2h26m 2h7m
linux 64 3.7 11.2 6h+ (3933/4242 309 remaining) 6h+
linux 64 3.8 11.2 6h+ (3897/4242 345 remaning) 6h+
linux 64 3.9 11.2 6h+ (3924/4242 318 remaining) 6h+ 6h+1656/1969 313 remaining
linux 64 3.10 11.2 6h+ (3962/4242 280 remaining) 6h+
osx-64 3.7 2h42m 2h39m
osx-64 3.8 3h28m 2h52m
osx-64 3.9 2h40m 2h42m
osx-64 3.10 3h2m 2h42m
osx-arm-64 3.8 1h51 1h37m
osx-arm-64 3.9 2h20m 2h10m
osx-arm-64 3.10 4h25m 2h1m

There are approximately:

  • 3600 files to compile for cmake for the CPU builds with the standard build process
  • 1600-1800 files to compile when using system dependencies: WIP: Use more system libsΒ #111

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions