-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Issue Type
Build/Install
Modules Involved
SPU runtime
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
https://github.com/AntCPLab/OpenBumbleBee
OS Platform and Distribution
Ubuntu 24.04.1 LTS
Python Version
3.10
Compiler Version
GCC 11.4.0
Current Behavior?
Hi dear developers and authors of SPU and BumbleBee,
I have tried to benchmark the MPC performance of a customized ViT model (not from Huggingface) using BumbleBee by locally building and compiling from scratch. Up to now I can successfully launch the SPU backend runtime using LOOPBACK with the output log at node 0 (see below), but when I executed the flax_vit_inference at node 1, I encountered the error ‘Exception: ('remote exception', Exception('Traceback (most recent call last)’ (see the output log of node 1 for details).
Best
Standalone code to reproduce the issue
node 0 run:
bazel run -c opt //examples/python/utils:nodectl -- --config `pwd`/examples/python/ml/flax_myvit/2pc.json up
node 1 run:
bazel run -c opt //examples/python/ml/flax_myvit:flax_vit_inference -- --config `pwd`/examples/python/ml/flax_myvit/2pc.jsonRelevant log output
================= node 0 log: =================
INFO: Analyzed target //examples/python/utils:nodectl (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/python/utils:nodectl up-to-date:
bazel-bin/examples/python/utils/nodectl
INFO: Elapsed time: 0.272s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/python/utils/nodectl --config /home/server2/Desktop/OpenBumbleBee-ndss/examples/python/ml/flax_myvit/2pc.json up
[2025-07-18 16:15:56,763] [ForkServerProcess-1] Starting grpc server at 127.0.0.1:61525
[2025-07-18 16:15:56,849] [ForkServerProcess-2] Starting grpc server at 127.0.0.1:61526
================= node 1 log: =================
INFO: Analyzed target //examples/python/ml/flax_myvit:flax_vit_inference (1 packages loaded, 3 targets configured).
INFO: Found 1 target...
Target //examples/python/ml/flax_myvit:flax_vit_inference up-to-date:
bazel-bin/examples/python/ml/flax_myvit/flax_vit_inference
INFO: Elapsed time: 0.555s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/python/ml/flax_myvit/flax_vit_inference --config /home/server2/Desktop/OpenBumbleBee-ndss/examples/python/ml/flax_myvit/2pc.json
Traceback (most recent call last):
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/examples/python/ml/flax_myvit/flax_vit_inference.py", line 38, in <module>
ppd.init(conf["nodes"], conf["devices"])
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 1175, in init
_CONTEXT = HostContext(nodes_def, devices_def)
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 1095, in __init__
self.devices[name] = SPU(
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 1010, in __init__
results = [future.result() for future in futures]
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 1010, in <listcomp>
results = [future.result() for future in futures]
File "/home/server2/anaconda3/envs/spu/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/server2/anaconda3/envs/spu/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/server2/anaconda3/envs/spu/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 247, in run
return self._call(self._stub.Run, fn, *args, **kwargs)
File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/ml/flax_myvit/flax_vit_inference.runfiles/spulib/spu/utils/distributed_impl.py", line 240, in _call
raise Exception("remote exception", result)
Exception: ('remote exception', Exception('Traceback (most recent call last):\n File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/utils/nodectl.runfiles/spulib/spu/utils/distributed_impl.py", line 326, in Run\n ret_objs = fn(self, *args, **kwargs)\n File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/utils/nodectl.runfiles/spulib/spu/utils/distributed_impl.py", line 559, in builtin_spu_init\n server._locals[f"{name}-rt"] = spu_api.Runtime(link, spu_config)\n File "/home/server2/.cache/bazel/_bazel_server2/8fc9c5947c29740d3fcd2a7bd75108aa/execroot/spulib/bazel-out/k8-opt/bin/examples/python/utils/nodectl.runfiles/spulib/spu/api.py", line 35, in __init__\n self._vm = libspu.RuntimeWrapper(link, config.SerializeToString())\nRuntimeError: what: \n\t[libspu/mpc/factory.cc:55] Invalid protocol kind SEMI2K\nstacktrace: \n#0 spu::RuntimeWrapper::RuntimeWrapper()+0x70da33243517\n#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x70da332439b6\n#2 pybind11::cpp_function::dispatcher()+0x70da33209c76\n#3 cfunction_call+0x4fd527\n\n\n'))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels