Package description

Intro

This package is meant to run fast parallel analysis and machine learning optimization using modern servers with Python and Pandas. In order to start your analysis you need a list of unmerged flat ROOT TTrees for data and MC. For full compatibility, it is recommended to produce your TTrees using the same format presented in https://github.com/ginnocen/ALICETreeCreator. The TTrees have be saved in a folder preserving the standard Grid folder structure (E.g.production/child_1/0001/AnalysisResults.root).

Package description

The package performs the following operations:

Conversion: the flat ROOT TTrees are converted into Pandas Dataframes saved in a pickle format.
Skimming : Dataframes are selected according to good run list, fiducial selection and any custom selection that the user can define.
ML files creation: a subset of the MC and data are merged and used to optimise the selection strategy. For the ML optimization, the signal is taken from MC and the background from data side-bands.
Optimisation: selection strategy is optimised using recent ML algorithms from SciKit, XGBoost, Keras. Trained models are saved and made available for analysis
Model application on data and MC: unmerged dataframes are processed. Candidates are selected according to standard analysis cuts or according to a loose cut on ML probability
Data merging: Merged dataframes are created with candidates selected by the standard analysis or the ML probability
Invariant mass and efficiency building: On the merged dataset, invariant mass spectra and efficiency plots are created and stored in the same ROOT format as the regular task output (AnalysisResults.root)