Skip to content

Commit 1e296c0

Browse files
author
pjacka
committed
[RDF] Adding support for HistoNSparseD in RDF
THnSparseD model is added; Interfaces for HistoNSparseD are added Adding tests for HistoNSparseD including concurrency
1 parent 86930ba commit 1e296c0

File tree

15 files changed

+392
-3
lines changed

15 files changed

+392
-3
lines changed

README/ReleaseNotes/v638/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ The following people have contributed to this new version:
3232
Florian Uhlig, GSI,\
3333
Devajith Valaparambil Sreeramaswamy, CERN/EP-SFT,\
3434
Vassil Vassilev, Princeton
35+
Petr Jacka, Czech Technical University in Prague
3536

3637
## Deprecation and Removal
3738

@@ -112,6 +113,7 @@ If you want to keep using `TList*` return values, you can write a small adapter
112113
to numbers such as 8 would share one 3-d histogram among 8 threads, greatly reducing the memory consumption. This might slow down execution if the histograms
113114
are filled at very high rates. Use lower number in this case.
114115
- The Snapshot method has been refactored so that it does not need anymore compile-time information (i.e. either template arguments or JIT-ting) to know the input column types. This means that any Snapshot call that specifies the template arguments, e.g. `Snapshot<int, float>(..., {"intCol", "floatCol"})` is now redundant and the template arguments can safely be removed from the call. At the same time, Snapshot does not need to JIT compile the column types, practically giving huge speedups depending on the number of columns that need to be written to disk. In certain cases (e.g. when writing O(10000) columns) the speedup can be larger than an order of magnitude. The Snapshot template is now deprecated and it will issue a compile-time warning when called. The function overload is scheduled for removal in ROOT 6.40.
116+
- Add HistoNSparseD action that fills a sparse N-dimensional histogram.
115117

116118
## Python Interface
117119

bindings/distrdf/python/DistRDF/Operation.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ def __init__(self, name: str, *args, **kwargs):
5151
# positional argument. In all Histo*D overload where it is present,
5252
# it is always the first argument.
5353
if not isinstance(self.args[0],
54-
(tuple, ROOT.RDF.TH1DModel, ROOT.RDF.TH2DModel, ROOT.RDF.TH3DModel, ROOT.RDF.THnDModel)):
54+
(tuple, ROOT.RDF.TH1DModel, ROOT.RDF.TH2DModel, ROOT.RDF.TH3DModel, ROOT.RDF.THnDModel, ROOT.RDF.THnSparseDModel)):
5555
message = (
5656
"Creating a histogram without a model is not supported in distributed mode. Please make sure to "
5757
"specify the histogram model when rerunning the distributed RDataFrame application. For example:\n\n"
@@ -104,6 +104,7 @@ class Transformation(Operation):
104104
"Histo2D": Histo,
105105
"Histo3D": Histo,
106106
"HistoND": Histo,
107+
"HistoNSparseD": Histo,
107108
"Max": Action,
108109
"Mean": Action,
109110
"Min": Action,

bindings/distrdf/test/test_operation.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,3 +112,14 @@ def test_histond_without_model(self):
112112
"""Creating a histogram without model raises ValueError."""
113113
with self.assertRaises(ValueError):
114114
_ = Operation.create_op("HistoND", ["a", "b", "c", "d"])
115+
116+
def test_histonsparsed_with_thnsparsedmodel(self):
117+
"""THnDModel"""
118+
op = Operation.create_op("HistoNSparseD", ROOT.RDF.THnSparseDModel(), ["a", "b", "c", "d"])
119+
self.assertIsInstance(op, Operation.Histo)
120+
self.assertEqual(op.name, "HistoNSparseD")
121+
122+
def test_histonsparsed_without_model(self):
123+
"""Creating a histogram without model raises ValueError."""
124+
with self.assertRaises(ValueError):
125+
_ = Operation.create_op("HistoNSparseD", ["a", "b", "c", "d"])

roottest/python/distrdf/backends/check_reducer_merge.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,28 @@ def test_histond_merge(self, payload):
125125
assert histond_distrdf.GetEntries() == histond_rdf.GetEntries()
126126
assert histond_distrdf.GetNbins() == histond_rdf.GetNbins()
127127

128+
def test_histonsparsed_merge(self, payload):
129+
"""Check the working of HistoND merge operation in the reducer."""
130+
nbins = (10, 10, 10, 10)
131+
xmin = (0., 0., 0., 0.)
132+
xmax = (100., 100., 100., 100.)
133+
modelTHNSparseD = ("name", "title", 4, nbins, xmin, xmax)
134+
colnames = ("x0", "x1", "x2", "x3")
135+
136+
connection, _ = payload
137+
distrdf = ROOT.RDataFrame(100, executor=connection)
138+
139+
rdf = ROOT.RDataFrame(100)
140+
141+
distrdf_withcols = self.define_four_columns(distrdf, colnames)
142+
rdf_withcols = self.define_four_columns(rdf, colnames)
143+
144+
histond_distrdf = distrdf_withcols.HistoNSparseD(modelTHNSparseD, colnames)
145+
histond_rdf = rdf_withcols.HistoNSparseD(modelTHNSparseD, colnames)
146+
147+
assert histond_distrdf.GetEntries() == histond_rdf.GetEntries()
148+
assert histond_distrdf.GetNbins() == histond_rdf.GetNbins()
149+
128150
def test_profile1d_merge(self, payload):
129151
"""Check the working of Profile1D merge operation in the reducer."""
130152
# Operations with DistRDF

tree/dataframe/inc/ROOT/RDF/HistoModels.hxx

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ class TH3D;
2020
template <typename T>
2121
class THnT;
2222
using THnD = THnT<double>;
23+
template <typename T>
24+
class THnSparseT;
25+
class TArrayD;
26+
using THnSparseD = THnSparseT<TArrayD>;
2327
class TProfile;
2428
class TProfile2D;
2529

@@ -123,6 +127,33 @@ struct THnDModel {
123127
std::shared_ptr<::THnD> GetHistogram() const;
124128
};
125129

130+
struct THnSparseDModel {
131+
TString fName;
132+
TString fTitle;
133+
int fDim;
134+
std::vector<int> fNbins;
135+
std::vector<double> fXmin;
136+
std::vector<double> fXmax;
137+
std::vector<std::vector<double>> fBinEdges;
138+
Int_t fChunkSize;
139+
140+
THnSparseDModel() = default;
141+
THnSparseDModel(const THnSparseDModel &) = default;
142+
~THnSparseDModel();
143+
THnSparseDModel(const ::THnSparseD &h);
144+
THnSparseDModel(const char *name, const char *title, int dim, const int *nbins, const double *xmin, const double *xmax, Int_t chunksize=1024 *16);
145+
// alternate version with std::vector to allow more convenient initialization from PyRoot
146+
THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
147+
const std::vector<double> &xmin, const std::vector<double> &xmax, Int_t chunksize=1024 *16);
148+
THnSparseDModel(const char *name, const char *title, int dim, const int *nbins,
149+
const std::vector<std::vector<double>> &xbins, Int_t chunksize=1024 *16);
150+
THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
151+
const std::vector<std::vector<double>> &xbins, Int_t chunksize=1024 *16);
152+
std::shared_ptr<::THnSparseD> GetHistogram() const;
153+
};
154+
155+
156+
126157
struct TProfile1DModel {
127158
TString fName;
128159
TString fTitle;

tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ struct Histo1D{};
8989
struct Histo2D{};
9090
struct Histo3D{};
9191
struct HistoND{};
92+
struct HistoNSparseD{};
9293
struct Graph{};
9394
struct GraphAsymmErrors{};
9495
struct Profile1D{};
@@ -121,7 +122,7 @@ struct HistoUtils<T, false> {
121122
static bool HasAxisLimits(T &) { return true; }
122123
};
123124

124-
// Generic filling (covers Histo2D, HistoND, Profile1D and Profile2D actions, with and without weights)
125+
// Generic filling (covers Histo2D, HistoND, HistoNSparseD, Profile1D and Profile2D actions, with and without weights)
125126
template <typename... ColTypes, typename ActionTag, typename ActionResultType, typename PrevNodeType>
126127
std::unique_ptr<RActionBase>
127128
BuildAction(const ColumnNames_t &bl, const std::shared_ptr<ActionResultType> &h, const unsigned int nSlots,

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
#include "TH2.h" // For Histo actions
4141
#include "TH3.h" // For Histo actions
4242
#include "THn.h"
43+
#include "THnSparse.h"
4344
#include "TProfile.h"
4445
#include "TProfile2D.h"
4546
#include "TStatistic.h"
@@ -2286,6 +2287,82 @@ public:
22862287
columnList.size());
22872288
}
22882289

2290+
2291+
////////////////////////////////////////////////////////////////////////////
2292+
/// \brief Fill and return a sparse N-dimensional histogram (*lazy action*).
2293+
/// \tparam FirstColumn The first type of the column the values of which are used to fill the object. Inferred if not
2294+
/// present.
2295+
/// \tparam OtherColumns A list of the other types of the columns the values of which are used to fill the
2296+
/// object.
2297+
/// \param[in] model The returned histogram will be constructed using this as a model.
2298+
/// \param[in] columnList
2299+
/// A list containing the names of the columns that will be passed when calling `Fill`.
2300+
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2301+
/// \return the N-dimensional histogram wrapped in a RResultPtr.
2302+
///
2303+
/// This action is *lazy*: upon invocation of this method the calculation is
2304+
/// booked but not executed. See RResultPtr documentation.
2305+
///
2306+
/// ### Example usage:
2307+
/// ~~~{.cpp}
2308+
/// auto myFilledObj = myDf.HistoND<float, float, float, float>({"name","title", 4,
2309+
/// {40,40,40,40}, {20.,20.,20.,20.}, {60.,60.,60.,60.}},
2310+
/// {"col0", "col1", "col2", "col3"});
2311+
/// ~~~
2312+
///
2313+
template <typename FirstColumn, typename... OtherColumns> // need FirstColumn to disambiguate overloads
2314+
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
2315+
{
2316+
std::shared_ptr<::THnSparseD> h(nullptr);
2317+
{
2318+
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
2319+
h = model.GetHistogram();
2320+
2321+
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2322+
h->Sumw2();
2323+
} else if (int(columnList.size()) != h->GetNdimensions()) {
2324+
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2325+
}
2326+
}
2327+
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, FirstColumn, OtherColumns...>(columnList, h, h,
2328+
fProxiedPtr);
2329+
}
2330+
2331+
////////////////////////////////////////////////////////////////////////////
2332+
/// \brief Fill and return a sparse N-dimensional histogram (*lazy action*).
2333+
/// \param[in] model The returned histogram will be constructed using this as a model.
2334+
/// \param[in] columnList A list containing the names of the columns that will be passed when calling `Fill`
2335+
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
2336+
/// \return the N-dimensional histogram wrapped in a RResultPtr.
2337+
///
2338+
/// This action is *lazy*: upon invocation of this method the calculation is
2339+
/// booked but not executed. Also see RResultPtr.
2340+
///
2341+
/// ### Example usage:
2342+
/// ~~~{.cpp}
2343+
/// auto myFilledObj = myDf.HistoNSparseD({"name","title", 4,
2344+
/// {40,40,40,40}, {20.,20.,20.,20.}, {60.,60.,60.,60.}},
2345+
/// {"col0", "col1", "col2", "col3"});
2346+
/// ~~~
2347+
///
2348+
RResultPtr<::THnSparseD> HistoNSparseD(const THnSparseDModel &model, const ColumnNames_t &columnList)
2349+
{
2350+
std::shared_ptr<::THnSparseD> h(nullptr);
2351+
{
2352+
ROOT::Internal::RDF::RIgnoreErrorLevelRAII iel(kError);
2353+
h = model.GetHistogram();
2354+
2355+
if (int(columnList.size()) == (h->GetNdimensions() + 1)) {
2356+
h->Sumw2();
2357+
} else if (int(columnList.size()) != h->GetNdimensions()) {
2358+
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
2359+
}
2360+
}
2361+
return CreateAction<RDFInternal::ActionTags::HistoNSparseD, RDFDetail::RInferredType>(columnList, h, h, fProxiedPtr,
2362+
columnList.size());
2363+
}
2364+
2365+
22892366
////////////////////////////////////////////////////////////////////////////
22902367
/// \brief Fill and return a TGraph object (*lazy action*).
22912368
/// \tparam X The type of the column used to fill the x axis.

tree/dataframe/inc/ROOT/RDF/RMergeableValue.hxx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ actions:
245245
- [Histo{1D,2D,3D}]
246246
(classROOT_1_1RDF_1_1RInterface.html#a247ca3aeb7ce5b95015b7fae72983055)
247247
- [HistoND](classROOT_1_1RDF_1_1RInterface.html#a0c9956a0f48c26f8e4294e17376c7fea)
248+
- [HistoNSparseD](classROOT_1_1RDF_1_1RInterface.html)
248249
- [Profile{1D,2D}]
249250
(classROOT_1_1RDF_1_1RInterface.html#a8ef7dc16b0e9f7bc9cfbe2d9e5de0cef)
250251
- [Stats](classROOT_1_1RDF_1_1RInterface.html#abc68922c464e472f5f856e8981955af6)

tree/dataframe/src/RDFHistoModels.cxx

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "TH2.h"
2121
#include "TH3.h"
2222
#include "THn.h"
23+
#include "THnSparse.h"
2324

2425
/**
2526
* \class ROOT::RDF::TH1DModel
@@ -46,6 +47,12 @@
4647
* \note It stores only basic settings such as name, title, bins, bin edges,
4748
* but not others such as fSumw2.
4849
*
50+
* \class ROOT::RDF::THnSparseDModel
51+
* \ingroup dataframe
52+
* \brief A struct which stores some basic parameters of a THnSparseD
53+
* \note It stores only basic settings such as name, title, bins, bin edges,
54+
* but not others such as fSumw2.
55+
*
4956
* \class ROOT::RDF::TProfile1DModel
5057
* \ingroup dataframe
5158
* \brief A struct which stores some basic parameters of a TProfile
@@ -295,6 +302,78 @@ std::shared_ptr<::THnD> THnDModel::GetHistogram() const
295302
}
296303
THnDModel::~THnDModel() {}
297304

305+
306+
THnSparseDModel::THnSparseDModel(const ::THnSparseD &h)
307+
: fName(h.GetName()), fTitle(h.GetTitle()), fDim(h.GetNdimensions()), fNbins(fDim), fXmin(fDim), fXmax(fDim),
308+
fBinEdges(fDim), fChunkSize(h.GetChunkSize())
309+
{
310+
for (int idim = 0; idim < fDim; ++idim) {
311+
fNbins[idim] = h.GetAxis(idim)->GetNbins();
312+
SetAxisProperties(h.GetAxis(idim), fXmin[idim], fXmax[idim], fBinEdges[idim]);
313+
}
314+
}
315+
316+
THnSparseDModel::THnSparseDModel(const char *name, const char *title, int dim, const int *nbins, const double *xmin,
317+
const double *xmax, Int_t chunksize)
318+
: fName(name), fTitle(title), fDim(dim), fBinEdges(dim), fChunkSize(chunksize)
319+
{
320+
fNbins.reserve(fDim);
321+
fXmin.reserve(fDim);
322+
fXmax.reserve(fDim);
323+
for (int idim = 0; idim < fDim; ++idim) {
324+
fNbins.push_back(nbins[idim]);
325+
fXmin.push_back(xmin[idim]);
326+
fXmax.push_back(xmax[idim]);
327+
}
328+
}
329+
330+
THnSparseDModel::THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
331+
const std::vector<double> &xmin, const std::vector<double> &xmax, Int_t chunksize)
332+
: fName(name), fTitle(title), fDim(dim), fNbins(nbins), fXmin(xmin), fXmax(xmax), fBinEdges(dim), fChunkSize(chunksize)
333+
{
334+
}
335+
336+
THnSparseDModel::THnSparseDModel(const char *name, const char *title, int dim, const int *nbins,
337+
const std::vector<std::vector<double>> &xbins, Int_t chunksize)
338+
: fName(name), fTitle(title), fDim(dim), fXmin(dim, 0.), fXmax(dim, 64.), fBinEdges(xbins), fChunkSize(chunksize)
339+
{
340+
fNbins.reserve(fDim);
341+
for (int idim = 0; idim < fDim; ++idim) {
342+
fNbins.push_back(nbins[idim]);
343+
}
344+
}
345+
346+
THnSparseDModel::THnSparseDModel(const char *name, const char *title, int dim, const std::vector<int> &nbins,
347+
const std::vector<std::vector<double>> &xbins, Int_t chunksize)
348+
: fName(name), fTitle(title), fDim(dim), fNbins(nbins), fXmin(dim, 0.), fXmax(dim, 64.), fBinEdges(xbins), fChunkSize(chunksize)
349+
{
350+
}
351+
352+
std::shared_ptr<::THnSparseD> THnSparseDModel::GetHistogram() const
353+
{
354+
bool varbinning = false;
355+
for (const auto &bins : fBinEdges) {
356+
if (!bins.empty()) {
357+
varbinning = true;
358+
break;
359+
}
360+
}
361+
std::shared_ptr<::THnSparseD> h;
362+
if (varbinning) {
363+
std::vector<TAxis> axes(fDim);
364+
for (int idim = 0; idim < fDim; ++idim) {
365+
axes[idim] = TAxis(fNbins[idim], fBinEdges[idim].data());
366+
}
367+
h = std::make_shared<::THnSparseD>(fName, fTitle, axes, fChunkSize);
368+
// h = std::make_shared<::THnSparseD>(fName, fTitle, fDim, fNbins.data(), fBinEdges, fChunkSize);
369+
} else {
370+
h = std::make_shared<::THnSparseD>(fName, fTitle, fDim, fNbins.data(), fXmin.data(), fXmax.data(), fChunkSize);
371+
}
372+
return h;
373+
}
374+
THnSparseDModel::~THnSparseDModel() {}
375+
376+
298377
// Profiles
299378

300379
TProfile1DModel::TProfile1DModel(const ::TProfile &h)

tree/dataframe/src/RDataFrame.cxx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ produce many different results in one event loop. Instant actions trigger the ev
135135
| GraphAsymmErrors() | Fills a TGraphAsymmErrors. Should be used for any type of graph with errors, including cases with errors on one of the axes only. If multi-threading is enabled, the order of the points may not be the one expected, it is therefore suggested to sort if before drawing. |
136136
| Histo1D(), Histo2D(), Histo3D() | Fill a one-, two-, three-dimensional histogram with the processed column values. |
137137
| HistoND() | Fill an N-dimensional histogram with the processed column values. |
138+
| HistoNSparseD() | Fill an N-dimensional sparse histogram with the processed column values. Memory is allocated only for non-empty bins. |
138139
| Max() | Return the maximum of processed column values. If the type of the column is inferred, the return type is `double`, the type of the column otherwise.|
139140
| Mean() | Return the mean of processed column values.|
140141
| Min() | Return the minimum of processed column values. If the type of the column is inferred, the return type is `double`, the type of the column otherwise.|
@@ -737,7 +738,7 @@ parts of the RDataFrame API currently work with this package. The subset that is
737738
- FilterMissing
738739
- Graph
739740
- Histo[1,2,3]D
740-
- HistoND
741+
- HistoND, HistoNSparseD
741742
- Max
742743
- Mean
743744
- Min

0 commit comments

Comments
 (0)