Update read me and include better comments

IasonTheodorou · IasonTheodorou · commit 386f54628e80 · 2025-09-23T18:51:58.000+02:00
diff --git a/README.md b/README.md
@@ -1,40 +1,32 @@
-# SPEED SAM C++ TENSORRT
-![SAM C++ TENSORRT](assets/speed_sam_cpp_tenosrrt.PNG)
+# SAM C++ ONNX implementation
 
-<a href="https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT" style="margin: 0 2px;">
-    <img src='https://img.shields.io/badge/GitHub-Repo-blue?style=flat&logo=GitHub' alt='GitHub'>
-  </a>
-
-  <a href="https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT?tab=GPL-3.0-1-ov-file" style="margin: 0 2px;">
-    <img src='https://img.shields.io/badge/License-CC BY--NC--4.0-lightgreen?style=flat&logo=Lisence' alt='License'>
-  </a>
+Inspired by SAM NN from meta and Tensor-RT implementation from: https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT.git
 
 ## 🌐 Overview
-A high-performance C++ implementation for SAM (segment anything model) using TensorRT and CUDA, optimized for real-time image segmentation tasks.
+A high-performance C++ implementation for SAM (segment anything model) using ONNX and CUDA, optimized for real-time image segmentation tasks.
 
-## 📢 Updates
-    Model Conversion: Build TensorRT engines from ONNX models for accelerated inference.
-    Segmentation with Points and BBoxes: Easily segment images using selected points or bounding boxes.
-    FP16 Precision: Choose between FP16 and FP32 for speed and precision balance.
-    Dynamic Shape Support: Efficient handling of variable input sizes using optimization profiles.
-    CUDA Optimization: Leverage CUDA for preprocessing and efficient memory handling.
 
 ## 📢 Performance
+
+### Warm-Up cost :fire:
+    NVIDIA GeForce RTX 3050
+    Encoder Cuda warm-up cost 66.875 ms.
+    Decoder Cuda warm-up cost 53.87 ms.
+
  ### Infernce Time
 
-| Component                  | SpeedSAM |
-|----------------------------|-----------|
-| **Image Encoder**          |           |
-| Parameters                  | 5M        |
-| Speed                       | 8ms       |
-| **Mask Decoder**           |           |
-| Parameters                  | 3.876M    |
-| Speed                       | 4ms       |
-| **Whole Pipeline (Enc+Dec)** |         |
-| Parameters                  | 9.66M     |
-| Speed                       | 12ms      |
-### Results
-![SPEED-SAM-C-TENSORRT RESULT](assets/Speed_SAM_Results.JPG)
+| Component                  | Pre processing | Inference | Post processing |
+|----------------------------|----------------| --------- | ----------------|
+| **Image Encoder**          |           | ||
+| Parameters                  | 5M        |- | -|
+| Speed                       | 8ms       | 33.322ms | 0.437ms |
+| **Mask Decoder**           |           | ||
+| Parameters                  | 3.876M    |- |- |
+| Speed                       | 34ms       | 11.176ms | 5.984|
+| **Whole Pipeline (Enc+Dec)** |         | | |
+| Parameters                  | 9.66M     | -| -|
+| Su of Speed                       | 92.92ms      | - |-  |
+
 
 ## 📂 Project Structure
     SPEED-SAM-CPP-TENSORRT/
@@ -53,44 +45,23 @@ A high-performance C++ implementation for SAM (segment anything model) using Ten
     └── CMakeLists.txt        # CMake configuration
 
 # 🚀 Installation
-## Prerequisites
-    git clone https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT.git
-    cd SPEED-SAM-CPP-TENSORRT
-
+## Compile
+    git clone <repo>
+    cd sam_onnx_ros
     # Create a build directory and compile
     mkdir build && cd build
     cmake ..
     make -j$(nproc)
-Note: Update the CMakeLists.txt with the correct paths for TensorRT and OpenCV.
+
+Note: Update the CMakeLists.txt with the correct paths for Onnxruntime and OpenCV and Onnx Models (since for TechUnited we keep them on separate repositories).
+
+You can use main.cpp to run the application
+
+## ROS option
+    You can also run the code as a catkin package.
 
 ## 📦 Dependencies
     CUDA: NVIDIA's parallel computing platform
-    TensorRT: High-performance deep learning inference
+    Onnx: High-performance deep learning inference
     OpenCV: Image processing library
     C++17: Required standard for compilation
-
-# 🔍 Code Overview
-## Main Components
-    SpeedSam Class (speedSam.h): Manages image encoding and mask decoding.
-    EngineTRT Class (engineTRT.h): TensorRT engine creation and inference.
-    CUDA Utilities (cuda_utils.h): Macros for CUDA error handling.
-    Config (config.h): Defines model parameters and precision settings.
-## Key Functions
-    EngineTRT::build: Builds the TensorRT engine from an ONNX model.
-    EngineTRT::infer: Runs inference on the provided input data.
-    SpeedSam::predict: Segments an image using input points or bounding boxes.
-## 📞 Contact
-
-For advanced inquiries, feel free to contact me on LinkedIn: <a href="https://www.linkedin.com/in/hamdi-boukamcha/" target="_blank"> <img src="assets/blue-linkedin-logo.png" alt="LinkedIn" width="32" height="32"></a>
-
-## 📜 Citation
-
-If you use this code in your research, please cite the repository as follows:
-
-        @misc{boukamcha2024SpeedSam,
-            author = {Hamdi Boukamcha},
-            title = {SPEED-SAM-C-TENSORRT},
-            year = {2024},
-            publisher = {GitHub},
-            howpublished = {\url{https://github.com/hamdiboukamcha/SPEED-SAM-C-TENSORRT//}},
-        }
diff --git a/src/sam_inference.cpp b/src/sam_inference.cpp
@@ -188,11 +188,11 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
       double post_process_time =
           (double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
       if (_cudaEnable) {
-        std::cout << "[SAM(CUDA)]: " << pre_process_time << "ms pre-process, "
+        std::cout << "[SAM_encoder(CUDA)]: " << pre_process_time << "ms pre-process, "
                   << process_time << "ms inference, " << post_process_time
                   << "ms post-process." << std::endl;
       } else {
-        std::cout << "[SAM(CPU)]: " << pre_process_time << "ms pre-process, "
+        std::cout << "[SAM_encoder(CPU)]: " << pre_process_time << "ms pre-process, "
                   << process_time << "ms inference, " << post_process_time
                   << "ms post-process." << std::endl;
       }
@@ -235,6 +235,7 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
 #ifdef ROI
     for (const auto &box : boundingBoxes)
 #else
+
     for (const auto &box : result.boxes)
 #endif // ROI
     {
@@ -303,11 +304,11 @@ const char *SAM::TensorProcess(clock_t &starttime_1, const cv::Mat &iImg,
     double post_process_time =
         (double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
     if (_cudaEnable) {
-      std::cout << "[SAM(CUDA)]: " << pre_process_time << "ms pre-process, "
+      std::cout << "[SAM_decoder(CUDA)]: " << pre_process_time << "ms pre-process, "
                 << process_time << "ms inference, " << post_process_time
                 << "ms post-process." << std::endl;
     } else {
-      std::cout << "[SAM(CPU)]: " << pre_process_time << "ms pre-process, "
+      std::cout << "[SAM_decoder(CPU)]: " << pre_process_time << "ms pre-process, "
                 << process_time << "ms inference, " << post_process_time
                 << "ms post-process." << std::endl;
     }