ros-misc-utilities · berndpfrommer · Aug 6, 2025 · Jul 15, 2025 · Jul 18, 2025 · Jul 21, 2025
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -138,13 +138,15 @@ if(BUILD_TESTING)
   ament_pep257()
   ament_xmllint()
 
+  find_package(ffmpeg_image_transport_msgs)
   find_package(ament_cmake_gtest REQUIRED)
   ament_add_gtest(${PROJECT_NAME}_encoder_test test/encoder_test.cpp
     WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test)
     target_include_directories(${PROJECT_NAME}_encoder_test PUBLIC
     $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
     $<INSTALL_INTERFACE:include>)
-  target_link_libraries(${PROJECT_NAME}_encoder_test ${PROJECT_NAME})
+  target_link_libraries(${PROJECT_NAME}_encoder_test
+    ${ffmpeg_image_transport_msgs_TARGETS} ${PROJECT_NAME})
 endif()
 
 ament_export_include_directories(include)

diff --git a/README.md b/README.md
@@ -34,11 +34,68 @@ and follow the [instructions here](https://github.com/ros-misc-utilities/.github
 
 Make sure to source your workspace's ``install/setup.bash`` afterwards.
 
-## Parameters
+## ROS Parameters
 
-This package has no parameters. It is the upper layer's responsibility to e.g. manage the mapping between encoder and decoder, i.e. to tell the decoder class which libav decoder should be used for the decoding, or to set the encoding parameters.
+This package does not expose ROS parameters. It is the upper layer's responsibility to e.g. manage the mapping between encoder and decoder, i.e. to tell the decoder class which libav decoder should be used for the decoding, or to set the encoding parameters.
 
-### How to use a custom version of libav (aka ffmpeg)
+
+## API usage
+
+### Encoder
+
+Using the encoder involves the following steps:
+- instantiating the ``Encoder`` object.
+- setting properties like the libav encoder to use, the encoding formats, and AV options.
+- initializing the encoder object. This requires knowledge of the image size and therefore
+  can only be done when the first image is available. Note that many properties
+  (encoding formats etc) must have been set *before* initializing the encoder.
+- feeding images to the encoder (and handling the callbacks when encoded packets become available)
+- flushing the encoder (may result in additional callbacks with encoded packets)
+- destroying the ``Encoder`` object.
+
+The ``Encoder`` class description has a short example code snippet.
+
+When using libav it is important to understand the difference between the encoder and the codec. The *codec* is the standardized format in which images are encoded, for instance ``h264`` or ``hevc``. The *encoder* is a software module that can encode images for a given codec. For instance ``libx264``, ``libx264rgb``, ``h264_nvenc``, and ``h264_vaapi`` are all *encoders* that encode for the *codec* h264.
+Some of the encoders are hardware accelerated, some can handle more image formats than others, but in the end they all encode video for a specific codec. You set the libav encoder with the ``setEncoder()`` method.
+
+For the many AV options available for various libav encoders, and for ``qmax``, ``bit_rate`` and similar settings please refer to the ffmpeg documentation.
+
+The topic of "pixel formats" (also called "encodings") can be very confusing and frustrating, so here a few lines about it.
+- Images in ROS arrive as messages that are encoded in one of the formats allowed by the [sensor\_msgs/Image encodings](https://github.com/ros2/common_interfaces/blob/a2ef867438e6d4eed074d6e3668ae45187e7de86/sensor_msgs/include/sensor_msgs/image_encodings.hpp). If you pass that message in via the ``encodeImage(const Image & msg)`` message, it will first will be converted to an opencv matrix using the [cv_bridge](https://github.com/ros-perception/vision_opencv).
+If you don't set the ``cv_bridge_target_format`` via ``setCVBridgeTargetFormat(const std::string & fmt)``, the image will be converted to the default format of ``bgr8``. Depending on how performance sensitive you are (or if you have a ``mono8`` (gray) image!), this may not be what you want. Ideally, you should pick a ``cv_bridge_target_format`` that can be directly used by the libav decoder that you have chosen. You can use ffmpeg to list the formats that the libav encoder directly supports:
+  ```
+  ffmpeg -h encoder=libx264 | grep formats
+  ```
+  Note that the ffmpeg/libav format strings often diverge from the ROS encoding strings, so some guesswork and experimentation may be necessary to pick the right ``cv_bridge_target_format``.
+  If you choose to bypass the cv\_bridge conversion by feeding the images to the encoder directly via the ``encodeImage(const cv::Mat & img ...)`` method, you must still set the ``cv_bridge_target_format`` such that encoder knows what format the ``img`` argument has.
+- Once the image is available in ``cv_bridge_target_format`` the encoder may perform another conversion to an image format that the libav encoder supports, the ``av_source_pixel_format``. Again, ``ffmpeg -h encoder=<your encoder>`` shows the supported formats. If you don't set the ``av_source_pixel_format`` with ``setAVSourcePixelFormat()``, the encoder will try to pick one that is supported by the libav encoder. That often works but may not be optimal.
+- Finally, the libav encoder produces a packet in its output codec, e.g. ``h264``, ``hevc`` etc. This encoding format is provided as the ``codec`` parameter when the encoder calls back with the encoded packet. Later, the codec needs to be provided to the decoder so it can pick a suitable libav decoder.
+
+To summarize, the conversion goes as follows:
+```
+<ros_message> -> cv_bridge_target_format -> av_source_pixel_format --(libav encoder)--> codec
+```
+By properly choosing ``cv_bridge_target_format`` and ``av_source_pixel_format`` two of those conversions
+may become no-ops, but to what extend the cv\_bridge and libav actually recognize and optimize for this has not been looked into yet.
+
+Note that only very few combinations of libav encoders, ``cv_bridge_target_format`` and ``av_source_pixel_format`` have been tested. Please provide feedback if you observe crashes or find obvious bugs. PRs are always appreciated!
+
+### Decoder
+
+Using the decoder involves the following steps:
+- instantiating the ``Decoder`` object.
+- if so desired, setting the ROS output (image encoding) format.
+- initializing the decoder object. For this you need to know the encoding (codec, e.g. "h264").
+  During initialization you also have to present an ordered list of libav decoders that
+  should be used. If an empty list is provided, the decoder will attempt to find a suitable
+  libav decoder based on the encoding, with a preference for hardware accelerated decoding.
+- feeding encoded packets to the decoder (and handling the callbacks
+  when decoded images become available)
+- flushing the decoder (may result in additional callbacks with decoded images)
+- destroying the ``Decoder`` object.
+The ``Decoder`` class description has a short example code snippet.
+
+## How to use a custom version of libav (aka ffmpeg)
 
 Compile *and install* ffmpeg. Let's say the install directory is
 ``/home/foo/ffmpeg/build``, then for it to be found while building,
@@ -79,6 +136,10 @@ nvmpi like so:
                      'ffmpeg_image_transport.preset': 'll',
                      'ffmpeg_image_transport.gop_size': 15}]
 ```
+Sometimes the ffmpeg parameters show up under different names. If the above
+settings don't work, try the command ``ros2 param dump <name_of_your_node>``
+*after* subscribing to the ffmpeg image topic with e.g. ``ros2 topic hz``.
+From the output you can see what the correct parameter names are.
 ## License
 
 This software is issued under the Apache License Version 2.0.
diff --git a/include/ffmpeg_encoder_decoder/decoder.hpp b/include/ffmpeg_encoder_decoder/decoder.hpp
@@ -16,6 +16,7 @@
 #ifndef FFMPEG_ENCODER_DECODER__DECODER_HPP_
 #define FFMPEG_ENCODER_DECODER__DECODER_HPP_
 
+#include <ffmpeg_encoder_decoder/pts_map.hpp>
 #include <ffmpeg_encoder_decoder/tdiff.hpp>
 #include <ffmpeg_encoder_decoder/types.hpp>
 #include <functional>
@@ -36,51 +37,215 @@ extern "C" {
 
 namespace ffmpeg_encoder_decoder
 {
+/**
+ * \brief Decodes ffmpeg encoded messages via libav (ffmpeg).
+ *
+ * The Decoder class facilitates decoding of messages that have been encoded with the
+ * Encoder class by leveraging libav, the collection of libraries used by ffmpeg.
+ * Sample code:
+ ```
+void imageCallback(const sensor_msgs::msg::Image::ConstSharedPtr & img, bool isKeyFrame)
+{
+  // process decoded image here...
+}
+
+ffmpeg_encoder_decoder::Decoder decoder;
+ffmpeg_image_transport_msgs::msg::FFMPEGPacket msg;
+msg.header.frame_id = "frame_id";
+msg.width = 640;
+msg.height = 480;
+msg.encoding = "hevc";
+msg.data.resize(10000, 0);  // Obviously this is not a valid packet!!!
+
+if (!decoder.isInitialized()) {
+  decoder.initialize(msg.encoding, imageCallback, "hevc_cuvid");
+}
+
+for (int64_t i = 0; i < 10; i++) {
+  msg.header.stamp = rclcpp::Time(i, RCL_SYSTEM_TIME);
+  if (!decoder.decodePacket(
+        msg.encoding, &msg.data[0], msg.data.size(), msg.pts, msg.header.frame_id,
+        msg.header.stamp)) {
+    throw(std::runtime_error("error decoding packet!"));
+  }
+}
+decoder.flush();
+```
+ */
 class Decoder
 {
 public:
+  /**
+   * \brief callback function signature
+   * \param img pointer to decoded image
+   * \param isKeyFrame true if the decoded image is a keyframe
+   */
   using Callback = std::function<void(const ImageConstPtr & img, bool isKeyFrame)>;
-  using PTSMap = std::unordered_map<int64_t, rclcpp::Time>;
 
+  /**
+   * \brief Constructor.
+   */
   Decoder();
+
+  /**
+   * \brief Destructor. Calls reset();
+   */
   ~Decoder();
+
+  /**
+   * Test if decoder is initialized.
+   * \return true if the decoder is initialized.
+   */
   bool isInitialized() const { return (codecContext_ != NULL); }
-  // Initialize decoder upon first packet received,
-  // providing callback to be called when frame is complete.
-  // You must still call decodePacket(msg) afterward!
-  bool initialize(const std::string & encoding, Callback callback, const std::string & dec);
-  // clears all state, but leaves config intact
+
+  /**
+   * \brief Initializes the decoder for a given codec and libav decoder.
+   *
+   * Initializes the decoder, with multiple decoders to pick from.
+   * If the name of the libav decoder string is empty, a suitable libav decoder
+   * will be picked, or the initialization will fail if none is available.
+   * \param codec  the codec (encoding) from the first packet. Can never change!
+   * \param callback the function to call when frame has been decoded.
+   * \param decoder the name of the libav decoder to use. If empty string,
+   *                 the decoder will try to find a suitable one based on the encoding.
+   * \return true if initialized successfully.
+   */
+  bool initialize(const std::string & codec, Callback callback, const std::string & decoder);
+
+  /**
+   * \ brief initializes the decoder, trying different libavdecoders in order.
+   *
+   * Initialize decoder with multiple libav decoders to pick from.
+   * If decoders.empty() a default decoder will be chosen (if available).
+   * \param codec the codec (encoding) from the first packet. Can never change!
+   * \param callback the function to call when frame has been decoded.
+   * \param decoders names of the libav decoders to try sequentially. If empty()
+   *                 the decoder will try to find a suitable one based on the codec.
+   * \return true if successful
+   */
+  bool initialize(
+    const std::string & codec, Callback callback, const std::vector<std::string> & decoders);
+
+  /**
+   * \brief Sets the ROS output message encoding format.
+   *
+   * Sets the ROS output message encoding format. Must be compatible with one of the
+   * libav encoding formats, or else exception will be thrown. If not set,
+   * the output encoding will default to bgr8.
+   * \param output_encoding defaults to bgr8
+   * \throw std::runtime_error() if no matching libav pixel format could be found
+   */
+  void setOutputMessageEncoding(const std::string & output_encoding);
+
+  /**
+   * \brief Clears all decoder state except for timers, loggers, and other settings.
+   */
   void reset();
-  // decode packet (may result in frame callback!)
+
+  /**
+   * \brief Decodes encoded packet.
+   *
+   * Decodes packet. Decoder must have been initialized beforehand. Calling this
+   * function may result in callback with decoded frame.
+   * \param encoding  the name of the encoding/codec (typically from msg encoding)
+   * \param data pointer to packet data
+   * \param size size of packet data
+   * \param pts presentation time stamp of data packet
+   * \param frame_id ros frame id (from message header)
+   * \param stamp ros message header time stamp
+   * \return true if decoding was successful
+   */
   bool decodePacket(
     const std::string & encoding, const uint8_t * data, size_t size, uint64_t pts,
     const std::string & frame_id, const rclcpp::Time & stamp);
 
+  /**
+   * \brief Flush decoder.
+   *
+   * This method can only be called once at the end of the decoding stream.
+   * It will force any buffered packets to be delivered as frames. No further
+   * packets can be fed to the decoder after calling flush().
+   * \return true if flush was successful (libav returns EOF)
+   */
+  bool flush();
+
+  /**
+   * \brief Overrides the default ("Decoder") logger.
+   * \param logger the logger to override the default ("Decoder") with.
+   */
+  void setLogger(rclcpp::Logger logger) { logger_ = logger; }
+
+  /**
+   * \brief Finds all hardware and software decoders for a given codec.
+   *
+   * Finds the name of all hardware and software decoders that match
+   * a certain codec (or encoder).
+   * \param codec name of the codec, i.e. h264, hevc etc
+   * \param hw_decoders non-null pointer for returning list of hardware decoders
+   * \param sw_decoders non-null pointer for returning list of software decoders
+   */
+  static void findDecoders(
+    const std::string & codec, std::vector<std::string> * hw_decoders,
+    std::vector<std::string> * sw_decoders);
+
+  /**
+   * \brief Finds all decoders that can decode a given codec.
+   *
+   * Finds the name of all hardware and software decoders (combined)
+   * that match a certain codec (or encoder).
+   * \param codec name of the codec, i.e. h264, hevc etc
+   * \return vector with names of matching libav decoders
+   */
+  static std::vector<std::string> findDecoders(const std::string & codec);
+
+  /**
+   * \brief Enables or disables performance measurements. Poorly tested, may be broken.
+   * \param p set to true to enable performance debugging.
+   */
   void setMeasurePerformance(bool p) { measurePerformance_ = p; }
+
+  /**
+   * \brief Prints performance timers. Poorly tested, may be broken.
+   * \param prefix for labeling the printout
+   */
   void printTimers(const std::string & prefix) const;
+
+  /**
+   * \brief resets performance debugging timers. Poorly tested, may be broken.
+   */
   void resetTimers();
-  void setLogger(rclcpp::Logger logger) { logger_ = logger; }
-  static const std::unordered_map<std::string, std::string> & getDefaultEncoderToDecoderMap();
+
+  // ------------------- deprecated functions ---------------
+  /**
+   * \deprecated Use findDecoders(codec) instead.
+   */
+  [[deprecated(
+    "use findDecoders(codec) now.")]] static const std::unordered_map<std::string, std::string> &
+  getDefaultEncoderToDecoderMap();
 
 private:
-  rclcpp::Logger logger_;
-  bool initDecoder(const std::string & encoding, const std::string & decoder);
+  bool initSingleDecoder(const std::string & decoder);
+  bool initDecoder(const std::vector<std::string> & decoders);
+  int receiveFrame();
   // --------------- variables
+  rclcpp::Logger logger_;
   Callback callback_;
-  PTSMap ptsToStamp_;  // mapping of header
-
+  PTSMap ptsToStamp_;
   // --- performance analysis
   bool measurePerformance_{false};
   TDiff tdiffTotal_;
-  // --- libav stuff
+  // --- libav related variables
   AVRational timeBase_{1, 100};
   std::string encoding_;
   AVCodecContext * codecContext_{NULL};
-  AVFrame * decodedFrame_{NULL};
+  AVFrame * swFrame_{NULL};
   AVFrame * cpuFrame_{NULL};
-  AVFrame * colorFrame_{NULL};
+  AVFrame * outputFrame_{NULL};
   SwsContext * swsContext_{NULL};
   enum AVPixelFormat hwPixFormat_;
+  std::string outputMsgEncoding_;
+  enum AVPixelFormat outputAVPixFormat_ { AV_PIX_FMT_NONE };
+  int bitsPerPixel_;  // output format bits/pixel including padding
   AVPacket packet_;
   AVBufferRef * hwDeviceContext_{NULL};
 };