-
Notifications
You must be signed in to change notification settings - Fork 47
Package description
Gian Michele Innocenti edited this page Apr 10, 2019
·
15 revisions
This package is meant to run fast parallel analysis and machine learning optimization using modern servers with Python and Pandas. In order to start your analysis you need a list of unmerged flat ROOT TTrees for data and MC. For full compatibility, it is recommended to produce your TTrees using the same format presented in https://github.com/ginnocen/ALICETreeCreator. The TTrees have be saved in a folder preserving the standard Grid folder structure (E.g.production/child_1/0001/AnalysisResults.root).
The package performs the following operations:
- Conversion: the flat ROOT TTrees are converted into Pandas Dataframes saved in a pickle format.
- Skimming : Dataframes are selected according to good run list, fiducial selection and any custom selection that the user can define.
- ML files creation: a subset of the MC and data are merged and used to optimise the selection strategy. For the ML optimization, the signal is taken from MC and the background from data side-bands.
- Optimisation: selection strategy is optimised using recent ML algorithms from SciKit, XGBoost, Keras. Trained models are saved and made available for analysis
- Model application on data and MC: unmerged dataframes are processed. Candidates are selected according to standard analysis cuts or according to a loose cut on ML probability
- Data merging: Merged dataframes are created with candidates selected by the standard analysis or the ML probability
- Invariant mass and efficiency building: On the merged dataset, invariant mass spectra and efficiency plots are created and stored in the same ROOT format as the regular task output (AnalysisResults.root)