Add a c++ implementation for `podio-dump` #620

tmadlener · 2024-06-07T12:32:12Z

BEGINRELEASENOTES

Replace the podio-dump python implementation by a podio-dump wrapper script around the new podio-dump-tool c++ implementation. This is an (almost) drop-in replacement, only legacy files (i.e. files written before the introduction of the Frame) cannot be read by this.
- The existing python implementation is still available as podio-dump-legacy.
podio now depends on the fmtlib formatting library (version 9 or greater) as the podio-dump-tool uses that for formatting.
- This might change again in the future, but compilers don't have enough support for std::format yet to easily replace this.

ENDRELEASENOTES

This has taken a bit to develop (see below for the original description). The way it is now is a combination of two tools that are wrapped in a thin bash script that acts as the entry point and is called podio-dump. This is effectively only necessary because the datamodel definitions are stored as JSON inside the file and it's just way easier to transform that back to YAML in python than it is in c++. Hence, the bash script only checks for the corresponding flag and if that is present pipes the output of the c++ podio-dump-tool to a python script (json-to-yaml). Otherwise it simply calls the podio-dump-tool passing on all the arguments.

The dependency on fmtlib is necessary because it supports formatting of ranges and easy opt-in for implementations via operator<<. Additionally, std::print (and println) are only scheduled for c++23. Once there is more / all necessary support in <format>, adjusting to that and dropping the fmtlib dependency should be possible.

Original description

This is an attempt at making podio-dump quicker after several complaints (e.g. key4hep/EDM4hep#312). After some "profiling" it turns out that the slowest part in the python implementation is the loop over all the collections which can be significantly sped up by going to c++. In my local timings the current (python based) podio-dump is almost ten times slower than this (c++ based) podio-dump-cpp) for dumping the example_frame.root file from the tests (times via time)

	`podio-dump`	`podio-dump-cpp`
real	12.393s	1.513s
user	8.522s	1.251s
sys	3.823s	0.296s

The main disadvantages of the c++ implementation are that we need quite a bit of boilerplate for things that are trivial in python, e.g.:

We have to manually implement argument parsing and (parts of) the tabulate functionality
- Since formatting with iostream and iomanip is bordering on masochism, I have decided to pull in fmt for now. In principle c++20 has similar functionality in std::format (but no fmt::print that only comes with c++26). However, that requires gcc >=13 and clang >= 16.

Dumping datamodel definitions in YAML is missing entirely at the moment, since that would require dumping the internal json format as YAML. In python this is literally these 2 lines:

podio/tools/podio-dump

Lines 99 to 100 in d275460

    
           model_def = json.loads(reader.get_datamodel_definition(model_name)) 
        
           print(yaml.dump(model_def, sort_keys=False, default_flow_style=False))

in c++ this would require at least one other library to be pulled in

Since dumping the datamodel would require quite a bit of work in c++, I would be in favor of keeping that in python in a separate tool, while the other functionality could be covered by the c++ implementation.

TODO:

Implement detailed dumping mode
Builds on top of Fix some minor issues with the Reader and Writer interfaces #618
Includes Mark GenericParameters::print as const to allow proper usage #621

tools/CMakeLists.txt

tools/src/tabulate.h

Zehvogel · 2024-06-21T14:31:25Z

I wonder how an RDataFrame-based python version (with pre-compiled functions) would fare on a performance vs. comfort scale

tmadlener · 2025-04-01T13:40:44Z

I have rebased this onto the latest version of podio. The main difference to before is that podio-dump is now a thin bash script that simply wraps the podio-dump-tool c++ executable. In case the user wants to dump an EDM via --dump-edm we invoke a thin python tool json-to-yaml that does the conversion on the fly. Now this is a drop-in replacement for the previous version of podio-dump implemented in python. The one thing the c++ implementation does not handle is reading pre-release legacy files. (I don't think that is an actual requirement any longer).

There are some formatting differences wrt to the python implmentation, which are pretty much all related to the alignment of numerical outputs in the tables.

Using ROOT 6.34.04 and python 3.11.11 I get the following times (simply using the test times here):

Now

1/7 Test #185: podio-dump-help ..................   Passed    0.03 sec
2/7 Test #186: podio-dump-root ..................   Passed    3.58 sec
3/7 Test #187: podio-dump-detailed-root .........   Passed    3.51 sec
4/7 Test #190: podio-dump-sio ...................   Passed    0.04 sec
5/7 Test #191: podio-dump-detailed-sio ..........   Passed    0.04 sec
6/7 Test #194: podio-dump-rntuple ...............   Passed    0.24 sec
7/7 Test #195: podio-dump-detailed-rntuple ......   Passed    0.23 sec

and

1/6 Test #160: datamodel_def_store_roundtrip_root ................   Passed    0.90 sec
2/6 Test #161: datamodel_def_store_roundtrip_root_extension ......   Passed    0.81 sec
3/6 Test #162: datamodel_def_store_roundtrip_sio .................   Passed    0.68 sec
4/6 Test #163: datamodel_def_store_roundtrip_sio_extension .......   Passed    0.60 sec
5/6 Test #164: datamodel_def_store_roundtrip_rntuple .............   Passed    0.81 sec
6/6 Test #165: datamodel_def_store_roundtrip_rntuple_extension ...   Passed    0.71 sec

Before

1/7 Test #185: podio-dump-help ..................   Passed    0.17 sec
2/7 Test #186: podio-dump-root ..................   Passed    8.50 sec
3/7 Test #187: podio-dump-detailed-root .........   Passed    9.35 sec
4/7 Test #190: podio-dump-sio ...................   Passed    8.08 sec
5/7 Test #191: podio-dump-detailed-sio ..........   Passed    7.71 sec
6/7 Test #194: podio-dump-rntuple ...............   Passed    8.36 sec
7/7 Test #195: podio-dump-detailed-rntuple ......   Passed    8.00 sec

and

1/6 Test #160: datamodel_def_store_roundtrip_root ................   Passed    2.62 sec
2/6 Test #161: datamodel_def_store_roundtrip_root_extension ......   Passed    2.52 sec
3/6 Test #162: datamodel_def_store_roundtrip_sio .................   Passed    2.32 sec
4/6 Test #163: datamodel_def_store_roundtrip_sio_extension .......   Passed    2.30 sec
5/6 Test #164: datamodel_def_store_roundtrip_rntuple .............   Passed    2.63 sec
6/6 Test #165: datamodel_def_store_roundtrip_rntuple_extension ...   Passed    2.53 sec

So, depending on which ROOT backend is used the speedup is somewhere between 2x and 30x. For SIO it is another factor 10 on top of that (for the test files). Even if the speedup is "only" a factor two for the most prevalent file format at the moment, I still think this would be worth it, also considering the increased implementation side, since many things that came packaged for python are now handrolled here.

I have checked fmt-lib down to version 8 which didn't compile, so minimal fmt version is 9. Things work unchanged up to version 11 (aka the latest at the moment).

jmcarcell · 2025-04-01T14:11:37Z

tools/src/podio-dump-tool.cpp

+}
+
+template <typename T>
+std::string getTypeString() {


Suggested change

std::string getTypeString() {

constexpr std::string getTypeString() {

I'm not sure this will change anything...

Made it a consteval std::string_view, so it's definitely evaluated at compile time.

jmcarcell · 2025-04-01T14:13:29Z

tools/podio-dump.py

@@ -0,0 +1,213 @@
+#!/usr/bin/env python3


Rename to podio-dump-legacy.py?

Renamed to podio-dump-legacy

jmcarcell · 2025-04-01T14:17:28Z

tools/src/podio-dump-tool.cpp

+  const auto collNames = frame.getAvailableCollections();
+  for (const auto& name : podio::utils::sortAlphabeticaly(collNames)) {
+    const auto coll = frame.get(name);
+    print_flush("{}\n", name);


Now that it is much faster than before, is it needed to flush so much? Maybe let the receiver (terminal) flush whenever they want for better throughput (for example with large files)?

I have to check. This might also just be a remnant from porting the python implementation, where flushing on the python side was necessary, because otherwise python and c++ streams (from coll.print()) would not be synchronized properly.

Seems to work without.

jmcarcell · 2025-04-01T14:18:52Z

tools/src/podio-dump-tool.cpp

+}
+
+template <typename... Args>
+void print_flush(fmt::format_string<Args...> fmtstr, Args&&... args) {


Suggested change

void print_flush(fmt::format_string<Args...> fmtstr, Args&&... args) {

void print_flush(const fmt::format_string<Args...>& fmtstr, Args&&... args) {

The whole function has been removed, as it is no longer necessary.

tools/src/podio-dump-tool.cpp

tools/src/argparseUtils.h

tools/src/podio-dump-tool.cpp

jmcarcell · 2025-04-04T11:36:54Z

tools/src/argparseUtils.h

+inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {
+  std::vector<std::string> tokens;
+  std::string token;
+  for (char ch : str) {
+    if (ch == delimiter) {
+      if (!token.empty()) {
+        tokens.push_back(token);
+        token.clear();
+      }
+    } else {
+      token += ch;
+    }
+  }
+  if (!token.empty()) {
+    tokens.push_back(token);
+  }
+  return tokens;
+}


Suggested change

inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {

std::vector<std::string> tokens;

std::string token;

for (char ch : str) {

if (ch == delimiter) {

if (!token.empty()) {

tokens.push_back(token);

token.clear();

}

} else {

token += ch;

}

}

if (!token.empty()) {

tokens.push_back(token);

}

return tokens;

}

inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {

std::vector<std::string> result;

for (const auto part : std::ranges::views::split(str, delimiter)) {

result.emplace_back(part.begin(), part.end());

}

return result;

}

There is std::ranges::views::split to do exactly what this function does. In principle it should be possible to replace all the occurrences of splitString with std::ranges::views::split but more code would have to be modified to use views so this is a partial solution.

I have made this return a vector<string_view> and adapted the parseSizeOrExit to take a string_view.

tools/src/podio-dump-tool.cpp

tools/src/argparseUtils.h

jmcarcell · 2025-04-07T09:11:20Z

tools/src/argparseUtils.h

+    }
+    return number;
+  } catch (const std::invalid_argument&) {
+    std::cerr << "'" << str << "' cannot be parsed into an integer number" << std::endl;


I've noticed that now passing a bad range like -e 1: fails with '' cannot be parsed into an integer number while before this would fail with:

usage: podio-dump [-h] [-c CATEGORY] [-e ENTRIES] [-d] [--dump-edm DUMP_EDM] [--version] inputfile podio-dump: error: argument -e/--entries: '0:' cannot be parsed into a list of entries

Not a huge problem since it is stated in the help how to correctly pass a range, but the error message is a bit useless in that case.

Should be back to (almost) what it was before now. I have put in an explicit check for an empty string after the colon. Ideally parseSizeOrExit would return an optional, but that would require quite some other work as well.

tools/src/argparseUtils.h

jmcarcell · 2025-04-07T19:27:15Z

tools/src/tabulate.h

+#include <fmt/core.h>
+
+#include <algorithm>
+#include <iostream>


Suggested change

#include <iostream>

Marked by clangd

jmcarcell · 2025-04-07T19:30:52Z

I don't have any more comments and probably won't have another look in detail unless I find something. I have been using this and it feels soooo much better to use, certainly way faster even in debug builds it is very fast. It also crashes faster when reading an incompatible file 😄
I wonder if having a better python implementation of the readers and the frame (I'm quite sure that's where most of the time was spent) would have helped. In any case this is good to have for a tool that is (should be) used a lot.

tmadlener · 2025-04-08T12:47:25Z

I wonder if having a better python implementation of the readers and the frame

IIRC, it was really the loop in python over all the collections that was slow (and the import ROOT). But we can potentially revisit that later. I would be happy to merge this, unless someone speaks up.

Drop-in replacement of podio-dump python implementation which gets moved to podio-dump.py because it supports dumping the pre-release legacy files

Co-authored-by: Juan Miguel Carceller <[email protected]>

jmcarcell · 2025-04-10T14:05:13Z

This is now in the Key4hep nightlies on Alma 9 and Ubuntu 24. On Ubuntu 22, it doesn't build with GCC 11 anymore with multiple failures:

/podio/tools/src/podio-dump-tool.cpp:85:35: error: could not convert '{<expression error>}' from '<brace-enclosed initializer list>' to 'std::vector<long unsigned int>'
   85 |       return {parseSizeOrExit(*it)};

/podio/tools/src/podio-dump-tool.cpp:78:54: error: no match for 'operator|' (operand types are 'std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >' and 'std::ranges::views::__adaptor::_Partial<std::ranges::views::_Transform, parseEventRange(std::string_view)::<lambda(auto:32&&)> >')
   78 |     auto colonSplitRange = evtRange | rv::split(':') |
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~ ^
      |                                     |
      |                                     std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >
   79 |         rv::transform([](auto&& subrange) { return std::string_view(subrange.begin(), subrange.end()); });
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                      |
      |                      std::ranges::views::__adaptor::_Partial<std::ranges::views::_Transform, parseEventRange(std::string_view)::<lambda(auto:32&&)> >

/podio/tools/src/podio-dump-tool.cpp:79:57: error: no matching function for call to 'std::basic_string_view<char>::basic_string_view(std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >::_InnerIter<false>, std::default_sentinel_t)'
   79 |         rv::transform([](auto&& subrange) { return std::string_view(subrange.begin(), subrange.end()); });

I wasn't aware of this, but actually building with GCC 11 (with tests) hasn't been possible since #626. I think probably the way to go forward with this is to request GCC > 11 rather than trying to make the errors go away.

tmadlener · 2025-04-11T06:58:20Z

I am not entirely sure how to best deal with this, but gcc11 is still the system compiler on Alma9, so if we can keep support for it alive, I would be all for it.

jmcarcell · 2025-04-11T09:18:56Z

On the other side, GCC 15 is going to be released in ~ 1 month, so by now GCC 11 is already 4 years old. In addition no one seems to be building podio with tests since #626 is from last year's summer.

tmadlener · 2025-04-11T09:23:30Z

Well, it got merged in December and the v01-02 tag does not yet include it, so if people only build tagged versions they will not have encountered it yet. The main question is whether we consider gcc11 as c++20 compatible (enough) or not. Since we switched to c++20 after v01-02 that would also give us a clean cut and a reason to abandon gcc11.

since AIDASoft/podio#620

tmadlener commented Jun 7, 2024

View reviewed changes

tools/CMakeLists.txt Show resolved Hide resolved

tmadlener commented Jun 7, 2024

View reviewed changes

tools/src/tabulate.h Outdated Show resolved Hide resolved

tmadlener force-pushed the podio-dump-cpp branch from c7d24b2 to 6b34936 Compare June 25, 2024 12:06

tmadlener mentioned this pull request Sep 19, 2024

Add a tool to merge several podio files into a single one #681

Merged

tmadlener force-pushed the podio-dump-cpp branch from 6b34936 to 04b0bd2 Compare December 3, 2024 16:39

tmadlener force-pushed the podio-dump-cpp branch from 04b0bd2 to b49dd73 Compare April 1, 2025 12:25

jmcarcell reviewed Apr 1, 2025

View reviewed changes

tmadlener changed the title ~~[WIP] Add a c++ implementation for podio-dump~~ Add a c++ implementation for podio-dump Apr 1, 2025

tmadlener force-pushed the podio-dump-cpp branch 3 times, most recently from a22da4d to 8f723df Compare April 4, 2025 06:43

jmcarcell reviewed Apr 4, 2025

View reviewed changes

tools/src/podio-dump-tool.cpp Outdated Show resolved Hide resolved

jmcarcell reviewed Apr 4, 2025

View reviewed changes

tools/src/argparseUtils.h Outdated Show resolved Hide resolved

andresailer reviewed Apr 4, 2025

View reviewed changes

tools/src/podio-dump-tool.cpp Show resolved Hide resolved

jmcarcell reviewed Apr 4, 2025

View reviewed changes