Skip to content
This repository was archived by the owner on Dec 8, 2025. It is now read-only.

cannot import name 'c_lib' from partially initialized module 'byteps.torch' #448

@njqconquer

Description

@njqconquer

Describe the bug
Traceback (most recent call last):
File "/media/sdb1/niejinquan/compression-code/deep-gradient-compression/test.py", line 5, in
import byteps.torch as bps
File "/home/niejinquan/.conda/envs/GC/lib/python3.9/site-packages/byteps/torch/init.py", line 24, in
from byteps.torch.ops import push_pull_async_inplace as byteps_push_pull
File "/home/niejinquan/.conda/envs/GC/lib/python3.9/site-packages/byteps/torch/ops.py", line 29, in
from byteps.torch import c_lib
ImportError: cannot import name 'c_lib' from partially initialized module 'byteps.torch' (most likely due to a circular import) (/home/niejinquan/.conda/envs/GC/lib/python3.9/site-packages/byteps/torch/init.py)

To Reproduce
Steps to reproduce the behavior:

  1. pip install byteps
Image
  1. run the test.py

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import byteps.torch as bps

初始化

bps.init()
torch.manual_seed(42)

定义模型

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
    x = torch.relu(self.fc1(x))
    x = self.fc2(x)
    return x

数据准备

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('data', train=True, download=True,
transform=transform)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=64, shuffle=True)

模型和优化器

model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.01 * bps.size())
optimizer = bps.DistributedOptimizer(optimizer)

广播参数

bps.broadcast_parameters(model.state_dict(), root_rank=0)

训练循环

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data.view(-1, 784))
loss = nn.functional.cross_entropy(output, target)
loss.backward()
optimizer.step()

for epoch in range(1, 11):
train(epoch)

  1. See error
Image

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Ubuntu 18.04
  • GCC version: 9.4
  • CUDA and NCCL version: 11.4
  • Framework (TF, PyTorch, MXNet): PyTorch

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions