Skip to content

Commit 7ad64bc

Browse files
committed
[ntuple] Accept a single auxiliary processor in CreateJoin
Instead of having the `RNTupleJoinProcessor` keep track of multiple auxiliary processors per instance, it now only has one at a time. In case a multiway join is desired (i.e. with multiple auxiliary processors), the now-recommended way is to compose it using multiple `RNTupleJoinProcessor`s. The main rationale behind this change is the fact that this removes some overhead when creating the join processors (i.e., no need anymore to first create a vector of `RNTupleOpenSpecs` or processors), and that we currently foresee the majority of joins to only involve one auxiliary RNTuple (or combination thereof) anyways.
1 parent 2393c19 commit 7ad64bc

File tree

5 files changed

+135
-318
lines changed

5 files changed

+135
-318
lines changed

tree/ntuple/inc/ROOT/RNTupleProcessor.hxx

Lines changed: 26 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -299,50 +299,44 @@ public:
299299
///
300300
/// \param[in] primaryNTuple The name and location of the primary RNTuple. Its entries are processed in sequential
301301
/// order.
302-
/// \param[in] auxNTuples The names and locations of the RNTuples to join the primary RNTuple with. The order in
303-
/// which their entries are processed are determined by the primary RNTuple and doesn't necessarily have to be
304-
/// sequential.
302+
/// \param[in] auxNTuple The name and location of the RNTuple to join the primary RNTuple with. The order in which
303+
/// its entries are processed is determined by the primary RNTuple and doesn't necessarily have to be sequential.
305304
/// \param[in] joinFields The names of the fields on which to join, in case the specified RNTuples are unaligned.
306305
/// The join is made based on the combined join field values, and therefore each field has to be present in each
307306
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuple are fully aligned.
308307
/// \param[in] primaryModel An RNTupleModel specifying which fields from the primary RNTuple can be read by the
309308
/// processor. If no model is provided, one will be created based on the descriptor of the primary RNTuple.
310-
/// \param[in] auxModels A list of RNTupleModels specifying which fields from the corresponding auxiliary RNTuple
311-
/// (according to the order of `auxNTuples`) can be read by the processor. If this vector is empty, the models will
312-
/// be created based on the descriptors of their corresponding RNTuples. This also applies to individual auxiliary
313-
/// RNTuples for which the provided model is a `nullptr`.
309+
/// \param[in] auxModel An RNTupleModel specifying which fields from the auxiliary RNTuple can be read by the
310+
/// processor. If no model is provided, one will be created based on the descriptor of the auxiliary RNTuple.
314311
/// \param[in] processorName The name to give to the processor. If empty, the name of the primary RNTuple is used.
315312
///
316313
/// \return A pointer to the newly created RNTupleProcessor.
317314
static std::unique_ptr<RNTupleProcessor>
318-
CreateJoin(RNTupleOpenSpec primaryNTuple, std::vector<RNTupleOpenSpec> auxNTuples,
319-
const std::vector<std::string> &joinFields, std::unique_ptr<ROOT::RNTupleModel> primaryModel = nullptr,
320-
std::vector<std::unique_ptr<ROOT::RNTupleModel>> auxModels = {}, std::string_view processorName = "");
315+
CreateJoin(RNTupleOpenSpec primaryNTuple, RNTupleOpenSpec auxNTuple, const std::vector<std::string> &joinFields,
316+
std::unique_ptr<ROOT::RNTupleModel> primaryModel = nullptr,
317+
std::unique_ptr<ROOT::RNTupleModel> auxModel = nullptr, std::string_view processorName = "");
321318

322319
/////////////////////////////////////////////////////////////////////////////
323320
/// \brief Create an RNTupleProcessor for a *join* (i.e., a horizontal combination) of RNTuples.
324321
///
325322
/// \param[in] primaryProcessor The primary processor. Its entries are processed in sequential order.
326-
/// \param[in] auxProcessors The processors to join the primary processor with. The order in which their entries are
327-
/// processed are determined by the primary processor and doesn't necessarily have to be sequential.
323+
/// \param[in] auxProcessor The processor to join the primary processor with. The order in which its entries are
324+
/// processed is determined by the primary processor and doesn't necessarily have to be sequential.
328325
/// \param[in] joinFields The names of the fields on which to join, in case the specified processors are unaligned.
329326
/// The join is made based on the combined join field values, and therefore each field has to be present in each
330327
/// specified processors. If an empty list is provided, it is assumed that the specified processors are fully
331328
/// aligned.
332329
/// \param[in] primaryModel An RNTupleModel specifying which fields from the primary processor can be read by the
333330
/// processor. If no model is provided, one will be created based on the descriptor of the primary processor.
334-
/// \param[in] auxModels A list of RNTupleModels specifying which fields from the corresponding auxiliary processor
335-
/// (according to the order of `auxProcessors`) can be read by the processor. If this vector is empty, the models
336-
/// will be inferred from their corresponding processors. This also applies to individual auxiliary processors for
337-
/// which the provided model is a `nullptr`.
331+
/// \param[in] auxModel An RNTupleModel specifying which fields from the auxiliary processor can be read by the
332+
/// processor. If no model is provided, one will be created based on the descriptor of the auxiliary processor.
338333
/// \param[in] processorName The name to give to the processor. If empty, the name of the primary processor is used.
339334
///
340335
/// \return A pointer to the newly created RNTupleProcessor.
341336
static std::unique_ptr<RNTupleProcessor>
342-
CreateJoin(std::unique_ptr<RNTupleProcessor> primaryProcessor,
343-
std::vector<std::unique_ptr<RNTupleProcessor>> auxProcessors, const std::vector<std::string> &joinFields,
344-
std::unique_ptr<ROOT::RNTupleModel> primaryModel = nullptr,
345-
std::vector<std::unique_ptr<ROOT::RNTupleModel>> auxModels = {}, std::string_view processorName = "");
337+
CreateJoin(std::unique_ptr<RNTupleProcessor> primaryProcessor, std::unique_ptr<RNTupleProcessor> auxProcessor,
338+
const std::vector<std::string> &joinFields, std::unique_ptr<ROOT::RNTupleModel> primaryModel = nullptr,
339+
std::unique_ptr<ROOT::RNTupleModel> auxModel = nullptr, std::string_view processorName = "");
346340
};
347341

348342
// clang-format off
@@ -477,14 +471,12 @@ class RNTupleJoinProcessor : public RNTupleProcessor {
477471

478472
private:
479473
std::unique_ptr<RNTupleProcessor> fPrimaryProcessor;
480-
std::vector<std::unique_ptr<RNTupleProcessor>> fAuxiliaryProcessors;
474+
std::unique_ptr<RNTupleProcessor> fAuxiliaryProcessor;
481475

482476
/// Tokens representing the join fields present in the primary processor.
483477
std::vector<ROOT::RFieldToken> fJoinFieldTokens;
484-
std::vector<std::unique_ptr<Internal::RNTupleJoinTable>> fJoinTables;
485-
bool fJoinTablesAreBuilt = false;
486-
487-
bool HasJoinTable() const { return fJoinTables.size() > 0; }
478+
std::unique_ptr<Internal::RNTupleJoinTable> fJoinTable;
479+
bool fJoinTableIsBuilt = false;
488480

489481
/////////////////////////////////////////////////////////////////////////////
490482
/// \brief Load the entry identified by the provided entry number of the primary processor.
@@ -510,34 +502,31 @@ private:
510502
/// \brief Set fModel by combining the primary and auxiliary models.
511503
///
512504
/// \param[in] primaryModel The model of the primary processor.
513-
/// \param[in] auxModels Models of the auxiliary processors.
505+
/// \param[in] auxModel Model of the auxiliary processors.
514506
///
515507
/// To prevent field name clashes when one or more models have fields with duplicate names, fields from each
516508
/// auxiliary model are stored as a anonymous record, and subsequently registered as subfields in the join model.
517509
/// This way, they can be accessed from the processor's entry as `auxNTupleName.fieldName`.
518-
void SetModel(std::unique_ptr<ROOT::RNTupleModel> primaryModel,
519-
std::vector<std::unique_ptr<ROOT::RNTupleModel>> auxModels);
510+
void SetModel(std::unique_ptr<ROOT::RNTupleModel> primaryModel, std::unique_ptr<ROOT::RNTupleModel> auxModel);
520511

521512
/////////////////////////////////////////////////////////////////////////////
522513
/// \brief Construct a new RNTupleJoinProcessor.
523514
/// \param[in] primaryProcessor The primary processor. Its entries are processed in sequential order.
524-
/// \param[in] auxProcessors The processors to join the primary processor with. The order in which their entries are
525-
/// processed are determined by the primary processor and doesn't necessarily have to be sequential.
515+
/// \param[in] auxProcessor The processor to join the primary processor with. The order in which its entries are
516+
/// processed is determined by the primary processor and doesn't necessarily have to be sequential.
526517
/// \param[in] joinFields The names of the fields on which to join, in case the specified processors are unaligned.
527518
/// The join is made based on the combined join field values, and therefore each field has to be present in each
528519
/// specified processor. If an empty list is provided, it is assumed that the processors are fully aligned.
529520
/// \param[in] primaryModel An RNTupleModel specifying which fields from the primary processor can be read by the
530521
/// processor. If no model is provided, one will be created based on the descriptor of the primary processor.
531-
/// \param[in] auxModels A list of RNTupleModels specifying which fields from the corresponding auxiliary processor
532-
/// (according to the order of `auxProcessors`) can be read by the processor. If this vector is empty, the models
533-
/// will be inferred from their corresponding processors. This also applies to individual auxiliary processors for
534-
/// which the provided model is a `nullptr`.
522+
/// \param[in] auxModel An RNTupleModel specifying which fields from the auxiliary processor can be read by the
523+
/// processor. If no model is provided, one will be created based on the descriptor of the auxiliary processor.
535524
/// \param[in] processorName Name of the processor. Unless specified otherwise in RNTupleProcessor::CreateJoin, this
536525
/// is the name of the primary processor.
537526
RNTupleJoinProcessor(std::unique_ptr<RNTupleProcessor> primaryProcessor,
538-
std::vector<std::unique_ptr<RNTupleProcessor>> auxProcessors,
539-
const std::vector<std::string> &joinFields, std::unique_ptr<ROOT::RNTupleModel> primaryModel,
540-
std::vector<std::unique_ptr<ROOT::RNTupleModel>> auxModels, std::string_view processorName);
527+
std::unique_ptr<RNTupleProcessor> auxProcessor, const std::vector<std::string> &joinFields,
528+
std::unique_ptr<ROOT::RNTupleModel> primaryModel, std::unique_ptr<ROOT::RNTupleModel> auxModel,
529+
std::string_view processorName);
541530

542531
public:
543532
RNTupleJoinProcessor(const RNTupleJoinProcessor &) = delete;

0 commit comments

Comments
 (0)