Skip to content

Conversation

@tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Jun 7, 2024

BEGINRELEASENOTES

  • Replace the podio-dump python implementation by a podio-dump wrapper script around the new podio-dump-tool c++ implementation. This is an (almost) drop-in replacement, only legacy files (i.e. files written before the introduction of the Frame) cannot be read by this.
    • The existing python implementation is still available as podio-dump-legacy.
  • podio now depends on the fmtlib formatting library (version 9 or greater) as the podio-dump-tool uses that for formatting.
    • This might change again in the future, but compilers don't have enough support for std::format yet to easily replace this.

ENDRELEASENOTES

This has taken a bit to develop (see below for the original description). The way it is now is a combination of two tools that are wrapped in a thin bash script that acts as the entry point and is called podio-dump. This is effectively only necessary because the datamodel definitions are stored as JSON inside the file and it's just way easier to transform that back to YAML in python than it is in c++. Hence, the bash script only checks for the corresponding flag and if that is present pipes the output of the c++ podio-dump-tool to a python script (json-to-yaml). Otherwise it simply calls the podio-dump-tool passing on all the arguments.

The dependency on fmtlib is necessary because it supports formatting of ranges and easy opt-in for implementations via operator<<. Additionally, std::print (and println) are only scheduled for c++23. Once there is more / all necessary support in <format>, adjusting to that and dropping the fmtlib dependency should be possible.

Original description

This is an attempt at making podio-dump quicker after several complaints (e.g. key4hep/EDM4hep#312). After some "profiling" it turns out that the slowest part in the python implementation is the loop over all the collections which can be significantly sped up by going to c++. In my local timings the current (python based) podio-dump is almost ten times slower than this (c++ based) podio-dump-cpp) for dumping the example_frame.root file from the tests (times via time)

podio-dump podio-dump-cpp
real 12.393s 1.513s
user 8.522s 1.251s
sys 3.823s 0.296s

The main disadvantages of the c++ implementation are that we need quite a bit of boilerplate for things that are trivial in python, e.g.:

  • We have to manually implement argument parsing and (parts of) the tabulate functionality
    • Since formatting with iostream and iomanip is bordering on masochism, I have decided to pull in fmt for now. In principle c++20 has similar functionality in std::format (but no fmt::print that only comes with c++26). However, that requires gcc >=13 and clang >= 16.
  • Dumping datamodel definitions in YAML is missing entirely at the moment, since that would require dumping the internal json format as YAML. In python this is literally these 2 lines:

    podio/tools/podio-dump

    Lines 99 to 100 in d275460

    model_def = json.loads(reader.get_datamodel_definition(model_name))
    print(yaml.dump(model_def, sort_keys=False, default_flow_style=False))

    in c++ this would require at least one other library to be pulled in

Since dumping the datamodel would require quite a bit of work in c++, I would be in favor of keeping that in python in a separate tool, while the other functionality could be covered by the c++ implementation.

TODO:

@Zehvogel
Copy link
Contributor

I wonder how an RDataFrame-based python version (with pre-compiled functions) would fare on a performance vs. comfort scale

@tmadlener
Copy link
Collaborator Author

I have rebased this onto the latest version of podio. The main difference to before is that podio-dump is now a thin bash script that simply wraps the podio-dump-tool c++ executable. In case the user wants to dump an EDM via --dump-edm we invoke a thin python tool json-to-yaml that does the conversion on the fly. Now this is a drop-in replacement for the previous version of podio-dump implemented in python. The one thing the c++ implementation does not handle is reading pre-release legacy files. (I don't think that is an actual requirement any longer).

There are some formatting differences wrt to the python implmentation, which are pretty much all related to the alignment of numerical outputs in the tables.

Using ROOT 6.34.04 and python 3.11.11 I get the following times (simply using the test times here):

Now

1/7 Test #185: podio-dump-help ..................   Passed    0.03 sec
2/7 Test #186: podio-dump-root ..................   Passed    3.58 sec
3/7 Test #187: podio-dump-detailed-root .........   Passed    3.51 sec
4/7 Test #190: podio-dump-sio ...................   Passed    0.04 sec
5/7 Test #191: podio-dump-detailed-sio ..........   Passed    0.04 sec
6/7 Test #194: podio-dump-rntuple ...............   Passed    0.24 sec
7/7 Test #195: podio-dump-detailed-rntuple ......   Passed    0.23 sec

and

1/6 Test #160: datamodel_def_store_roundtrip_root ................   Passed    0.90 sec
2/6 Test #161: datamodel_def_store_roundtrip_root_extension ......   Passed    0.81 sec
3/6 Test #162: datamodel_def_store_roundtrip_sio .................   Passed    0.68 sec
4/6 Test #163: datamodel_def_store_roundtrip_sio_extension .......   Passed    0.60 sec
5/6 Test #164: datamodel_def_store_roundtrip_rntuple .............   Passed    0.81 sec
6/6 Test #165: datamodel_def_store_roundtrip_rntuple_extension ...   Passed    0.71 sec

Before

1/7 Test #185: podio-dump-help ..................   Passed    0.17 sec
2/7 Test #186: podio-dump-root ..................   Passed    8.50 sec
3/7 Test #187: podio-dump-detailed-root .........   Passed    9.35 sec
4/7 Test #190: podio-dump-sio ...................   Passed    8.08 sec
5/7 Test #191: podio-dump-detailed-sio ..........   Passed    7.71 sec
6/7 Test #194: podio-dump-rntuple ...............   Passed    8.36 sec
7/7 Test #195: podio-dump-detailed-rntuple ......   Passed    8.00 sec

and

1/6 Test #160: datamodel_def_store_roundtrip_root ................   Passed    2.62 sec
2/6 Test #161: datamodel_def_store_roundtrip_root_extension ......   Passed    2.52 sec
3/6 Test #162: datamodel_def_store_roundtrip_sio .................   Passed    2.32 sec
4/6 Test #163: datamodel_def_store_roundtrip_sio_extension .......   Passed    2.30 sec
5/6 Test #164: datamodel_def_store_roundtrip_rntuple .............   Passed    2.63 sec
6/6 Test #165: datamodel_def_store_roundtrip_rntuple_extension ...   Passed    2.53 sec

So, depending on which ROOT backend is used the speedup is somewhere between 2x and 30x. For SIO it is another factor 10 on top of that (for the test files). Even if the speedup is "only" a factor two for the most prevalent file format at the moment, I still think this would be worth it, also considering the increased implementation side, since many things that came packaged for python are now handrolled here.


I have checked fmt-lib down to version 8 which didn't compile, so minimal fmt version is 9. Things work unchanged up to version 11 (aka the latest at the moment).

}

template <typename T>
std::string getTypeString() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string getTypeString() {
constexpr std::string getTypeString() {

I'm not sure this will change anything...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a consteval std::string_view, so it's definitely evaluated at compile time.

@@ -0,0 +1,213 @@
#!/usr/bin/env python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to podio-dump-legacy.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to podio-dump-legacy

const auto collNames = frame.getAvailableCollections();
for (const auto& name : podio::utils::sortAlphabeticaly(collNames)) {
const auto coll = frame.get(name);
print_flush("{}\n", name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that it is much faster than before, is it needed to flush so much? Maybe let the receiver (terminal) flush whenever they want for better throughput (for example with large files)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to check. This might also just be a remnant from porting the python implementation, where flushing on the python side was necessary, because otherwise python and c++ streams (from coll.print()) would not be synchronized properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work without.

}

template <typename... Args>
void print_flush(fmt::format_string<Args...> fmtstr, Args&&... args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void print_flush(fmt::format_string<Args...> fmtstr, Args&&... args) {
void print_flush(const fmt::format_string<Args...>& fmtstr, Args&&... args) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function has been removed, as it is no longer necessary.

@tmadlener tmadlener changed the title [WIP] Add a c++ implementation for podio-dump Add a c++ implementation for podio-dump Apr 1, 2025
@tmadlener tmadlener force-pushed the podio-dump-cpp branch 3 times, most recently from a22da4d to 8f723df Compare April 4, 2025 06:43
Comment on lines 15 to 32
inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {
std::vector<std::string> tokens;
std::string token;
for (char ch : str) {
if (ch == delimiter) {
if (!token.empty()) {
tokens.push_back(token);
token.clear();
}
} else {
token += ch;
}
}
if (!token.empty()) {
tokens.push_back(token);
}
return tokens;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {
std::vector<std::string> tokens;
std::string token;
for (char ch : str) {
if (ch == delimiter) {
if (!token.empty()) {
tokens.push_back(token);
token.clear();
}
} else {
token += ch;
}
}
if (!token.empty()) {
tokens.push_back(token);
}
return tokens;
}
inline std::vector<std::string> splitString(const std::string& str, const char delimiter) {
std::vector<std::string> result;
for (const auto part : std::ranges::views::split(str, delimiter)) {
result.emplace_back(part.begin(), part.end());
}
return result;
}

There is std::ranges::views::split to do exactly what this function does. In principle it should be possible to replace all the occurrences of splitString with std::ranges::views::split but more code would have to be modified to use views so this is a partial solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this return a vector<string_view> and adapted the parseSizeOrExit to take a string_view.

}
return number;
} catch (const std::invalid_argument&) {
std::cerr << "'" << str << "' cannot be parsed into an integer number" << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that now passing a bad range like -e 1: fails with '' cannot be parsed into an integer number while before this would fail with:

usage: podio-dump [-h] [-c CATEGORY] [-e ENTRIES] [-d] [--dump-edm DUMP_EDM] [--version]
                  inputfile
podio-dump: error: argument -e/--entries: '0:' cannot be parsed into a list of entries

Not a huge problem since it is stated in the help how to correctly pass a range, but the error message is a bit useless in that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be back to (almost) what it was before now. I have put in an explicit check for an empty string after the colon. Ideally parseSizeOrExit would return an optional, but that would require quite some other work as well.

#include <fmt/core.h>

#include <algorithm>
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <iostream>

Marked by clangd

@jmcarcell
Copy link
Member

jmcarcell commented Apr 7, 2025

I don't have any more comments and probably won't have another look in detail unless I find something. I have been using this and it feels soooo much better to use, certainly way faster even in debug builds it is very fast. It also crashes faster when reading an incompatible file 😄
I wonder if having a better python implementation of the readers and the frame (I'm quite sure that's where most of the time was spent) would have helped. In any case this is good to have for a tool that is (should be) used a lot.

@tmadlener
Copy link
Collaborator Author

I wonder if having a better python implementation of the readers and the frame

IIRC, it was really the loop in python over all the collections that was slow (and the import ROOT). But we can potentially revisit that later. I would be happy to merge this, unless someone speaks up.

Drop-in replacement of podio-dump python implementation which gets moved
to podio-dump.py because it supports dumping the pre-release legacy files
@tmadlener tmadlener merged commit aeb8148 into AIDASoft:master Apr 9, 2025
18 of 19 checks passed
@tmadlener tmadlener deleted the podio-dump-cpp branch April 9, 2025 14:42
@jmcarcell
Copy link
Member

This is now in the Key4hep nightlies on Alma 9 and Ubuntu 24. On Ubuntu 22, it doesn't build with GCC 11 anymore with multiple failures:

/podio/tools/src/podio-dump-tool.cpp:85:35: error: could not convert '{<expression error>}' from '<brace-enclosed initializer list>' to 'std::vector<long unsigned int>'
   85 |       return {parseSizeOrExit(*it)};
/podio/tools/src/podio-dump-tool.cpp:78:54: error: no match for 'operator|' (operand types are 'std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >' and 'std::ranges::views::__adaptor::_Partial<std::ranges::views::_Transform, parseEventRange(std::string_view)::<lambda(auto:32&&)> >')
   78 |     auto colonSplitRange = evtRange | rv::split(':') |
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~ ^
      |                                     |
      |                                     std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >
   79 |         rv::transform([](auto&& subrange) { return std::string_view(subrange.begin(), subrange.end()); });
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                      |
      |                      std::ranges::views::__adaptor::_Partial<std::ranges::views::_Transform, parseEventRange(std::string_view)::<lambda(auto:32&&)> >
/podio/tools/src/podio-dump-tool.cpp:79:57: error: no matching function for call to 'std::basic_string_view<char>::basic_string_view(std::ranges::split_view<std::basic_string_view<char>, std::ranges::single_view<char> >::_InnerIter<false>, std::default_sentinel_t)'
   79 |         rv::transform([](auto&& subrange) { return std::string_view(subrange.begin(), subrange.end()); });

I wasn't aware of this, but actually building with GCC 11 (with tests) hasn't been possible since #626. I think probably the way to go forward with this is to request GCC > 11 rather than trying to make the errors go away.

@tmadlener
Copy link
Collaborator Author

I am not entirely sure how to best deal with this, but gcc11 is still the system compiler on Alma9, so if we can keep support for it alive, I would be all for it.

@jmcarcell
Copy link
Member

On the other side, GCC 15 is going to be released in ~ 1 month, so by now GCC 11 is already 4 years old. In addition no one seems to be building podio with tests since #626 is from last year's summer.

@tmadlener
Copy link
Collaborator Author

Well, it got merged in December and the v01-02 tag does not yet include it, so if people only build tagged versions they will not have encountered it yet. The main question is whether we consider gcc11 as c++20 compatible (enough) or not. Since we switched to c++20 after v01-02 that would also give us a clean cut and a reason to abandon gcc11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants