Skip to content

Commit afa2e3b

Browse files
authored
Release the source code of TorchSparse v2.1 (#254)
* [Major] Add v2.1 source code. * [Minor] Update README.md * [Minor] Add PCEngine kernel citation information.
1 parent b55506a commit afa2e3b

File tree

104 files changed

+16218
-1973
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+16218
-1973
lines changed

README.md

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
11
# TorchSparse
22

3+
<p align="center">
4+
<img
5+
src="./docs/figs/torchsparse.png"
6+
height="300"
7+
>
8+
9+
310
TorchSparse is a high-performance neural network library for point cloud processing.
411

5-
### [website](http://torchsparse.mit.edu/) | [paper](https://arxiv.org/abs/2204.10319) | [presentation](https://www.youtube.com/watch?v=IIh4EwmcLUs) | [documents](http://torchsparse-docs.github.io/) | [pypi server](http://pypi.hanlab.ai/simple/torchsparse)
12+
### [website](http://torchsparse.mit.edu/) | [paper (MICRO 2023)](https://www.dropbox.com/scl/fi/obdku0kqxjlkvuom2opk4/paper.pdf?rlkey=0zmy8eq9fzllgkx54zsvwsecf&dl=0) | [paper (MLSys 2022)](https://arxiv.org/abs/2204.10319) | [presentation](https://www.youtube.com/watch?v=IIh4EwmcLUs) | [documents](http://torchsparse-docs.github.io/) | [pypi server](http://pypi.hanlab.ai/simple/torchsparse)
613

714

815
## Introduction
@@ -11,6 +18,8 @@ Point cloud computation has become an increasingly more important workload for a
1118

1219
## News
1320

21+
**\[2023/10/30\]** We present TorchSparse++ at 56th IEEE/ACM International Symposium on Microarchitecture (MICRO 2023). We also fully release the source code of TorchSparse++.
22+
1423
**\[2023/6/18\]** TorchSparse++ has been released and presented at CVPR 2023 workshops on autonomous driving. It achieves 1.7-2.9x inference speedup over previous state-of-the-art systems.
1524

1625
**\[2022/8/29\]** TorchSparse is presented at MLSys 2022. Talk video is available [here](https://www.youtube.com/watch?v=IIh4EwmcLUs).
@@ -46,6 +55,13 @@ We provide pre-built torchsparse v2.1.0 packages (recommended) with different Py
4655

4756
If Pypi server does not work as expected, no worries, you can still manually download the wheels. The wheels are listed in [this website](http://pypi.hanlab.ai/simple/torchsparse). One can utilize our installation script to automatically determine the version number used to index the wheels. For example, if you use PyTorch 1.11.0, CUDA 11.5, the version number will end up to be 2.1.0+torch111cu115. You can then select the proper wheel according to your Python version.
4857

58+
59+
You may also alternatively install our library from source via:
60+
61+
```bash
62+
python setup.py install
63+
```
64+
4965
## Benchmarks
5066

5167
### Inference benchmarks
@@ -73,6 +89,7 @@ TorchSparse is developed by the following wonderful team:
7389
- [Ke Hong](https://ieeexplore.ieee.org/author/37089419138): Graduate student (2021-) at Tsinghua University EE, v2.1 core developer, authored PCEngine kernels;
7490
- [Zhongming Yu](https://fishmingyu.github.io/): Ph.D. student (2022-) at UCSD CS, v2.1 core developer, authored PCEngine kernels;
7591
- [Yujun Lin](https://yujunlin.com/): Ph.D. student (2018-) at MIT EECS, v2.0 core developer;
92+
- [Yingqi Cao](https://github.com/ioeddk): Undergrad student at UC San Diego, currently working on the TorchSparse++ integration into algorithm frameworks;
7693
- [Guohao Dai](https://scholar.google.com/citations?user=gz3Tkl0AAAAJ&hl=en): Associate Professor at Shanghai Jiao Tong University, mentor of the project;
7794
- [Yu Wang](http://nicsefc.ee.tsinghua.edu.cn/): Professor at Tsinghua University, mentor of the project;
7895
- [Song Han](https://songhan.mit.edu): Associate Professor at MIT EECS, mentor of the project.
@@ -82,6 +99,17 @@ TorchSparse is developed by the following wonderful team:
8299

83100
If you use TorchSparse, please use the following BibTeX entries to cite:
84101

102+
TorchSparse++ (TorchSparse v2.1) is presented at MICRO 2023:
103+
104+
```bibtex
105+
@inproceedings{tangandyang2023torchsparse,
106+
title={TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs},
107+
author={Tang, Haotian and Yang, Shang and Liu, Zhijian and Hong, Ke and Yu, Zhongming and Li, Xiuyu and Dai, Guohao and Wang, Yu and Han, Song},
108+
booktitle={IEEE/ACM International Symposium on Microarchitecture (MICRO)},
109+
year={2023}
110+
}
111+
```
112+
85113
Preliminary version of TorchSparse++ (TorchSparse v2.1) is presented at CVPR Workshops 2023:
86114

87115
```bibtex
@@ -128,6 +156,9 @@ PCEngine paper is accepted by MLSys 2023:
128156

129157
## Acknowledgement
130158

131-
We thank Yan Yan from TuSimple for helpful discussions.
159+
We thank Yan Yan from TuSimple for helpful discussions. Please also have a look at the [dgSparse](https://dgsparse.github.io/) library, which is designed for fast and efficient sparse computation on graphs and point clouds. The work from PCEngine (MLSys 2023) team is also highly related to us.
160+
161+
TorchSparse is inspired by many existing open-source libraries, including (but not limited to) [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine), [SECOND](https://github.com/traveller59/second.pytorch) and [SparseConvNet](https://github.com/facebookresearch/SparseConvNet).
162+
163+
We also thank [AttributeDict](https://github.com/grimen/python-attributedict/tree/master) for providing an elegant way to manage the kernel/model configurations.
132164

133-
Please also have a look at the [dgSparse](https://dgsparse.github.io/) library, which is designed for fast and efficient sparse computation on graphs and point clouds. The work from PCEngine (MLSys 2023) team is also highly related to us.

cython_setup.py

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# cython: language_level=3
2+
import glob
3+
import os
4+
import sys
5+
6+
import torch
7+
import torch.cuda
8+
from setuptools import find_packages, setup
9+
from torch.utils.cpp_extension import (
10+
CUDA_HOME,
11+
BuildExtension,
12+
CppExtension,
13+
CUDAExtension,
14+
)
15+
16+
# from torchsparse import __version__
17+
18+
from Cython.Build import cythonize
19+
20+
cython_clean_flag = False
21+
22+
version_file = open("./torchsparse/version.py")
23+
version = version_file.read().split("'")[1]
24+
print("torchsparse version:", version)
25+
26+
if (torch.cuda.is_available() and CUDA_HOME is not None) or (
27+
os.getenv("FORCE_CUDA", "0") == "1"
28+
):
29+
device = "cuda"
30+
pybind_fn = f"pybind_{device}.cu"
31+
else:
32+
device = "cpu"
33+
pybind_fn = f"pybind_{device}.cpp"
34+
35+
sources = [os.path.join("torchsparse", "backend", pybind_fn)]
36+
for fpath in glob.glob(os.path.join("torchsparse", "backend", "**", "*")):
37+
if (fpath.endswith("_cpu.cpp") and device in ["cpu", "cuda"]) or (
38+
fpath.endswith("_cuda.cu") and device == "cuda"
39+
):
40+
sources.append(fpath)
41+
42+
pyx_files = []
43+
for root, dirnames, filenames in os.walk("torchsparse"):
44+
for filename in filenames:
45+
file_path = os.path.join(root, filename)
46+
if file_path.endswith(".py"):
47+
file_path2 = file_path + "x"
48+
os.system("mv " + file_path + " " + file_path2)
49+
os.system("sed -i '1s/^/# cython: language_level=3\\n/' " + file_path2)
50+
pyx_files.append(file_path2)
51+
52+
if pyx_files == []:
53+
for root, dirnames, filenames in os.walk("torchsparse"):
54+
for filename in filenames:
55+
file_path = os.path.join(root, filename)
56+
if file_path.endswith(".pyx"):
57+
pyx_files.append(file_path)
58+
59+
extension_type = CUDAExtension if device == "cuda" else CppExtension
60+
extra_compile_args = {
61+
"cxx": ["-g", "-O3", "-fopenmp", "-lgomp"],
62+
"nvcc": ["-O3", "-std=c++17"],
63+
}
64+
65+
setup(
66+
name="torchsparse",
67+
version=version,
68+
packages=find_packages(),
69+
ext_modules=cythonize(
70+
[
71+
extension_type(
72+
"torchsparse.backend", sources, extra_compile_args=extra_compile_args
73+
),
74+
]
75+
+ pyx_files
76+
),
77+
install_requires=[
78+
"numpy",
79+
"backports.cached_property",
80+
"tqdm",
81+
"typing-extensions",
82+
"wheel",
83+
"rootpath",
84+
"attributedict",
85+
],
86+
cmdclass={"build_ext": BuildExtension},
87+
zip_safe=False,
88+
)
89+
90+
# Clean up
91+
if cython_clean_flag:
92+
for root, dirnames, filenames in os.walk("torchsparse"):
93+
for filename in filenames:
94+
file_path = os.path.join(root, filename)
95+
if file_path.endswith(".c"):
96+
os.system("rm " + file_path)
97+
if file_path.endswith(".pyx"):
98+
os.system("rm " + file_path)

docs/figs/torchsparse.png

92.3 KB
Loading

examples/backbones.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@
99

1010
@torch.no_grad()
1111
def main() -> None:
12-
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
12+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
1313
from torchsparse.nn import functional as F
14-
F.set_kmap_mode('hashmap')
14+
15+
F.set_kmap_mode("hashmap")
1516

1617
for backbone in [SparseResNet21D, SparseResUNet42]:
17-
print(f'{backbone.__name__}:')
18+
print(f"{backbone.__name__}:")
1819
model: nn.Module = backbone(in_channels=4, width_multiplier=1.0)
1920
model = model.to(device).eval()
2021

@@ -36,8 +37,8 @@ def main() -> None:
3637

3738
# print feature shapes
3839
for k, output in enumerate(outputs):
39-
print(f'output[{k}].F.shape = {output.feats.shape}')
40+
print(f"output[{k}].F.shape = {output.feats.shape}")
4041

4142

42-
if __name__ == '__main__':
43+
if __name__ == "__main__":
4344
main()

examples/example.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@
1111
import torchsparse
1212
from torchsparse import SparseTensor
1313
from torchsparse import nn as spnn
14+
from torchsparse.nn import functional as F
1415
from torchsparse.utils.collate import sparse_collate_fn
1516
from torchsparse.utils.quantize import sparse_quantize
1617

1718

1819
class RandomDataset:
19-
2020
def __init__(self, input_size: int, voxel_size: float) -> None:
2121
self.input_size = input_size
2222
self.voxel_size = voxel_size
@@ -27,26 +27,28 @@ def __getitem__(self, _: int) -> Dict[str, Any]:
2727

2828
coords, feats = inputs[:, :3], inputs
2929
coords -= np.min(coords, axis=0, keepdims=True)
30-
coords, indices = sparse_quantize(coords,
31-
self.voxel_size,
32-
return_index=True)
30+
coords, indices = sparse_quantize(coords, self.voxel_size, return_index=True)
3331

3432
coords = torch.tensor(coords, dtype=torch.int)
3533
feats = torch.tensor(feats[indices], dtype=torch.float)
3634
labels = torch.tensor(labels[indices], dtype=torch.long)
3735

3836
input = SparseTensor(coords=coords, feats=feats)
3937
label = SparseTensor(coords=coords, feats=labels)
40-
return {'input': input, 'label': label}
38+
return {"input": input, "label": label}
4139

4240
def __len__(self):
4341
return 100
4442

4543

46-
if __name__ == '__main__':
44+
if __name__ == "__main__":
45+
conv_config = F.get_default_conv_config()
46+
# conv_config.dataflow = F.Dataflow.GatherScatter
47+
F.set_global_conv_config(conv_config)
48+
4749
parser = argparse.ArgumentParser()
48-
parser.add_argument('--device', type=str, default='cuda')
49-
parser.add_argument('--amp_enabled', action='store_true')
50+
parser.add_argument("--device", type=str, default="cuda")
51+
parser.add_argument("--amp_enabled", action="store_true")
5052
args = parser.parse_args()
5153

5254
random.seed(0)
@@ -81,14 +83,14 @@ def __len__(self):
8183
scaler = amp.GradScaler(enabled=args.amp_enabled)
8284

8385
for k, feed_dict in enumerate(dataflow):
84-
inputs = feed_dict['input'].to(device=args.device)
85-
labels = feed_dict['label'].to(device=args.device)
86+
inputs = feed_dict["input"].to(device=args.device)
87+
labels = feed_dict["label"].to(device=args.device)
8688

8789
with amp.autocast(enabled=args.amp_enabled):
8890
outputs = model(inputs)
8991
loss = criterion(outputs.feats, labels.feats)
9092

91-
print(f'[step {k + 1}] loss = {loss.item()}')
93+
print(f"[step {k + 1}] loss = {loss.item()}")
9294

9395
optimizer.zero_grad()
9496
scaler.scale(loss).backward()
@@ -99,14 +101,14 @@ def __len__(self):
99101
model.eval()
100102
# enable fused and locality-aware memory access optimization
101103
torchsparse.backends.benchmark = True # type: ignore
102-
104+
103105
with torch.no_grad():
104106
for k, feed_dict in enumerate(dataflow):
105-
inputs = feed_dict['input'].to(device=args.device).half()
106-
labels = feed_dict['label'].to(device=args.device)
107+
inputs = feed_dict["input"].to(device=args.device).half()
108+
labels = feed_dict["label"].to(device=args.device)
107109

108110
with amp.autocast(enabled=True):
109111
outputs = model(inputs)
110112
loss = criterion(outputs.feats, labels.feats)
111113

112-
print(f'[inference step {k + 1}] loss = {loss.item()}')
114+
print(f"[inference step {k + 1}] loss = {loss.item()}")

examples/performance.py

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,12 @@ def generate_random_point_cloud(size=100000, voxel_size=0.2):
2828
input = SparseTensor(coords=coords, feats=feats)
2929
label = SparseTensor(coords=coords, feats=labels)
3030

31-
feed_dict = {'input': input, 'label': label}
31+
feed_dict = {"input": input, "label": label}
3232

3333
return feed_dict
3434

3535

36-
def generate_batched_random_point_clouds(size=100000,
37-
voxel_size=0.2,
38-
batch_size=2):
36+
def generate_batched_random_point_clouds(size=100000, voxel_size=0.2, batch_size=2):
3937
batch = []
4038
for _ in range(batch_size):
4139
batch.append(generate_random_point_cloud(size, voxel_size))
@@ -56,25 +54,25 @@ def dummy_train_3x3(device):
5654
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
5755
criterion = nn.CrossEntropyLoss().to(device)
5856

59-
print('Starting dummy_train_3x3...')
57+
print("Starting dummy_train_3x3...")
6058
time = datetime.now()
6159
with profiler.profile(profile_memory=True, use_cuda=True) as prof:
62-
with profiler.record_function('model_inference'):
60+
with profiler.record_function("model_inference"):
6361
for _ in range(10):
6462
feed_dict = generate_batched_random_point_clouds()
65-
inputs = feed_dict['input'].to(device)
66-
targets = feed_dict['label'].F.to(device).long()
63+
inputs = feed_dict["input"].to(device)
64+
targets = feed_dict["label"].F.to(device).long()
6765
outputs = model(inputs)
6866
optimizer.zero_grad()
6967
loss = criterion(outputs.F, targets)
7068
loss.backward()
7169
optimizer.step()
7270
# print('[step %d] loss = %f.'%(i, loss.item()))
73-
print(prof.key_averages().table(sort_by='cuda_time_total', row_limit=10))
74-
prof.export_chrome_trace('trace_dummy_3x3.json')
71+
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
72+
prof.export_chrome_trace("trace_dummy_3x3.json")
7573

7674
time = datetime.now() - time
77-
print('Finished dummy_train_3x3 in ', time)
75+
print("Finished dummy_train_3x3 in ", time)
7876

7977

8078
def dummy_train_3x1(device):
@@ -91,29 +89,29 @@ def dummy_train_3x1(device):
9189
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
9290
criterion = nn.CrossEntropyLoss().to(device)
9391

94-
print('Starting dummy_train_3x1 ...')
92+
print("Starting dummy_train_3x1 ...")
9593
time = datetime.now()
9694
with profiler.profile(profile_memory=True, use_cuda=True) as prof:
97-
with profiler.record_function('model_inference'):
95+
with profiler.record_function("model_inference"):
9896
for _ in range(10):
9997
feed_dict = generate_batched_random_point_clouds()
100-
inputs = feed_dict['input'].to(device)
101-
targets = feed_dict['label'].F.to(device).long()
98+
inputs = feed_dict["input"].to(device)
99+
targets = feed_dict["label"].F.to(device).long()
102100
outputs = model(inputs)
103101
optimizer.zero_grad()
104102
loss = criterion(outputs.F, targets)
105103
loss.backward()
106104
optimizer.step()
107105
# print('[step %d] loss = %f.'%(i, loss.item()))
108-
print(prof.key_averages().table(sort_by='cuda_time_total', row_limit=10))
109-
prof.export_chrome_trace('trace_dummy_3x1.json')
106+
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
107+
prof.export_chrome_trace("trace_dummy_3x1.json")
110108

111109
time = datetime.now() - time
112-
print('Finished dummy_train_3x1 in ', time)
110+
print("Finished dummy_train_3x1 in ", time)
113111

114112

115-
if __name__ == '__main__':
116-
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
113+
if __name__ == "__main__":
114+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
117115

118116
dummy_train_3x1(device)
119117
dummy_train_3x3(device)

0 commit comments

Comments
 (0)