Segmentation fault (core dumped) error for multiple GPUs

Environment:
- Python: 3.6
- PyTorch: 0.4.0
- OS: Ubuntu 18.04.1 LTS
- CUDA: V9.1.85
- GPU: Tesla K80
Problem:
I was running a model that does not need BatchNorm, so I changed the original DesneNet a little bit.
Here is the code snippet:
```Python
def _cat_function_factory(conv, relu):
    def cat_function(*inputs):
        concated_features = torch.cat(inputs, 1)
        bottleneck_output = relu(conv(concated_features))
        return bottleneck_output
    return cat_function


class _DenseLayer(nn.Module):
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.add_module('conv1', nn.Conv2d(num_input_features, bn_size * growth_rate, 1))
        self.add_module('relu1', nn.ReLU(inplace=True))
        self.add_module('conv2', nn.Conv2d(bn_size * growth_rate, growth_rate, 3, padding=1))
        self.add_module('relu2', nn.ReLU(inplace=True))
        self.drop_rate = drop_rate

    def forward(self, *inputs):
        cat_function = _cat_function_factory(self.conv1, self.relu1)
        if any(feature.requires_grad for feature in inputs):
            output = cp.checkpoint(cat_function, *inputs)
        else:
            output = cat_function(*inputs)
        new_features = self.relu2(self.conv2(output))
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
        return new_features


class _DenseBlock(nn.Module):
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        for i in range(num_layers):
            layer = _DenseLayer(num_input_features + i * growth_rate,
                                growth_rate, bn_size, drop_rate)
            self.add_module(f'denselayer{i + 1}', layer)

    def forward(self, init_features):
        features = [init_features]
        for name, layer in self.named_children():
            new_features = layer(*features)
            features.append(new_features)
        return torch.cat(features, 1)
```
It can run on single GPU, but it throws a Segmentation fault (core dumped) error when running on multiple GPUS. What can be caused this issues?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault (core dumped) error for multiple GPUs #47

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Segmentation fault (core dumped) error for multiple GPUs #47

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions