Skip to content

Commit 386f546

Browse files
Update read me and include better comments
1 parent 733167f commit 386f546

File tree

2 files changed

+37
-65
lines changed

2 files changed

+37
-65
lines changed

README.md

Lines changed: 32 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,32 @@
1-
# SPEED SAM C++ TENSORRT
2-
![SAM C++ TENSORRT](assets/speed_sam_cpp_tenosrrt.PNG)
1+
# SAM C++ ONNX implementation
32

4-
<a href="https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT" style="margin: 0 2px;">
5-
<img src='https://img.shields.io/badge/GitHub-Repo-blue?style=flat&logo=GitHub' alt='GitHub'>
6-
</a>
7-
8-
<a href="https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT?tab=GPL-3.0-1-ov-file" style="margin: 0 2px;">
9-
<img src='https://img.shields.io/badge/License-CC BY--NC--4.0-lightgreen?style=flat&logo=Lisence' alt='License'>
10-
</a>
3+
Inspired by SAM NN from meta and Tensor-RT implementation from: https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT.git
114

125
## 🌐 Overview
13-
A high-performance C++ implementation for SAM (segment anything model) using TensorRT and CUDA, optimized for real-time image segmentation tasks.
6+
A high-performance C++ implementation for SAM (segment anything model) using ONNX and CUDA, optimized for real-time image segmentation tasks.
147

15-
## 📢 Updates
16-
Model Conversion: Build TensorRT engines from ONNX models for accelerated inference.
17-
Segmentation with Points and BBoxes: Easily segment images using selected points or bounding boxes.
18-
FP16 Precision: Choose between FP16 and FP32 for speed and precision balance.
19-
Dynamic Shape Support: Efficient handling of variable input sizes using optimization profiles.
20-
CUDA Optimization: Leverage CUDA for preprocessing and efficient memory handling.
218

229
## 📢 Performance
10+
11+
### Warm-Up cost :fire:
12+
NVIDIA GeForce RTX 3050
13+
Encoder Cuda warm-up cost 66.875 ms.
14+
Decoder Cuda warm-up cost 53.87 ms.
15+
2316
### Infernce Time
2417

25-
| Component | SpeedSAM |
26-
|----------------------------|-----------|
27-
| **Image Encoder** | |
28-
| Parameters | 5M |
29-
| Speed | 8ms |
30-
| **Mask Decoder** | |
31-
| Parameters | 3.876M |
32-
| Speed | 4ms |
33-
| **Whole Pipeline (Enc+Dec)** | |
34-
| Parameters | 9.66M |
35-
| Speed | 12ms |
36-
### Results
37-
![SPEED-SAM-C-TENSORRT RESULT](assets/Speed_SAM_Results.JPG)
18+
| Component | Pre processing | Inference | Post processing |
19+
|----------------------------|----------------| --------- | ----------------|
20+
| **Image Encoder** | | ||
21+
| Parameters | 5M |- | -|
22+
| Speed | 8ms | 33.322ms | 0.437ms |
23+
| **Mask Decoder** | | ||
24+
| Parameters | 3.876M |- |- |
25+
| Speed | 34ms | 11.176ms | 5.984|
26+
| **Whole Pipeline (Enc+Dec)** | | | |
27+
| Parameters | 9.66M | -| -|
28+
| Su of Speed | 92.92ms | - |- |
29+
3830

3931
## 📂 Project Structure
4032
SPEED-SAM-CPP-TENSORRT/
@@ -53,44 +45,23 @@ A high-performance C++ implementation for SAM (segment anything model) using Ten
5345
└── CMakeLists.txt # CMake configuration
5446

5547
# 🚀 Installation
56-
## Prerequisites
57-
git clone https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT.git
58-
cd SPEED-SAM-CPP-TENSORRT
59-
48+
## Compile
49+
git clone <repo>
50+
cd sam_onnx_ros
6051
# Create a build directory and compile
6152
mkdir build && cd build
6253
cmake ..
6354
make -j$(nproc)
64-
Note: Update the CMakeLists.txt with the correct paths for TensorRT and OpenCV.
55+
56+
Note: Update the CMakeLists.txt with the correct paths for Onnxruntime and OpenCV and Onnx Models (since for TechUnited we keep them on separate repositories).
57+
58+
You can use main.cpp to run the application
59+
60+
## ROS option
61+
You can also run the code as a catkin package.
6562

6663
## 📦 Dependencies
6764
CUDA: NVIDIA's parallel computing platform
68-
TensorRT: High-performance deep learning inference
65+
Onnx: High-performance deep learning inference
6966
OpenCV: Image processing library
7067
C++17: Required standard for compilation
71-
72-
# 🔍 Code Overview
73-
## Main Components
74-
SpeedSam Class (speedSam.h): Manages image encoding and mask decoding.
75-
EngineTRT Class (engineTRT.h): TensorRT engine creation and inference.
76-
CUDA Utilities (cuda_utils.h): Macros for CUDA error handling.
77-
Config (config.h): Defines model parameters and precision settings.
78-
## Key Functions
79-
EngineTRT::build: Builds the TensorRT engine from an ONNX model.
80-
EngineTRT::infer: Runs inference on the provided input data.
81-
SpeedSam::predict: Segments an image using input points or bounding boxes.
82-
## 📞 Contact
83-
84-
For advanced inquiries, feel free to contact me on LinkedIn: <a href="https://www.linkedin.com/in/hamdi-boukamcha/" target="_blank"> <img src="assets/blue-linkedin-logo.png" alt="LinkedIn" width="32" height="32"></a>
85-
86-
## 📜 Citation
87-
88-
If you use this code in your research, please cite the repository as follows:
89-
90-
@misc{boukamcha2024SpeedSam,
91-
author = {Hamdi Boukamcha},
92-
title = {SPEED-SAM-C-TENSORRT},
93-
year = {2024},
94-
publisher = {GitHub},
95-
howpublished = {\url{https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT//}},
96-
}

src/sam_inference.cpp

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -188,11 +188,11 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
188188
double post_process_time =
189189
(double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
190190
if (_cudaEnable) {
191-
std::cout << "[SAM(CUDA)]: " << pre_process_time << "ms pre-process, "
191+
std::cout << "[SAM_encoder(CUDA)]: " << pre_process_time << "ms pre-process, "
192192
<< process_time << "ms inference, " << post_process_time
193193
<< "ms post-process." << std::endl;
194194
} else {
195-
std::cout << "[SAM(CPU)]: " << pre_process_time << "ms pre-process, "
195+
std::cout << "[SAM_encoder(CPU)]: " << pre_process_time << "ms pre-process, "
196196
<< process_time << "ms inference, " << post_process_time
197197
<< "ms post-process." << std::endl;
198198
}
@@ -235,6 +235,7 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
235235
#ifdef ROI
236236
for (const auto &box : boundingBoxes)
237237
#else
238+
238239
for (const auto &box : result.boxes)
239240
#endif // ROI
240241
{
@@ -303,11 +304,11 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
303304
double post_process_time =
304305
(double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
305306
if (_cudaEnable) {
306-
std::cout << "[SAM(CUDA)]: " << pre_process_time << "ms pre-process, "
307+
std::cout << "[SAM_decoder(CUDA)]: " << pre_process_time << "ms pre-process, "
307308
<< process_time << "ms inference, " << post_process_time
308309
<< "ms post-process." << std::endl;
309310
} else {
310-
std::cout << "[SAM(CPU)]: " << pre_process_time << "ms pre-process, "
311+
std::cout << "[SAM_decoder(CPU)]: " << pre_process_time << "ms pre-process, "
311312
<< process_time << "ms inference, " << post_process_time
312313
<< "ms post-process." << std::endl;
313314
}

0 commit comments

Comments
 (0)