Example

Perform a heavy-ion optimisation!

In this tutorial you will learn how to optimize the selection the decay of a rare particle produced in a proton-proton collision. The main goal is to optimize the selection of the Lambda_c meson in order to increase the statistical significance of its peak. Lets go through the exercise step by step.

Get some input data!

Download from lxplus the two files that contain data and Monte Carlo Lambda_c candidate in proton-proton collisions collected with the ALICE detector at CERN. From the main folder of the repository, execute the following lines replacing <my_cern_user> with your NICE name :

cd machine_learning_hep/data
mkdir inputroot
scp <my_cern_user>@lxplus.cern.ch:/afs/cern.ch/work/g/ginnocen/public/exampleInputML/*.root .

Run the optimisation

The file doclassification_regression.py in the folder machine_learning_hep is the main script you will use to perform the analysis. This macro provides several functionalities.

Choose your classification problem

You can select the type of optimisation problem you want to perform. In our case we will keep the default values, which are the one needed for doing the Lambdac study.

mltype = "BinaryClassification"
mlsubtype = "HFmeson"
case = "Lc"

Choose your classification problem

You can select the transverse momentum region you want to consider in the optimisation. In our case we will focus one range from 2 to 4 GeV/c as in the default settings.

var_skimming = ["pt_cand_ML"]
varmin = [2]
varmax = [4]

Choose the number of signal and background candidates you want to use for the optimisation:

As it will be described later we need to define a training sample of pure signal and background candidates. The larger the number of candidates we will consider the more accurate (up to a certain point!) the optimisation will be. I would suggest to start with the default settings and increase it according to the computing power of your machine.

nevt_sig = 1000
nevt_bkg = 1000

Prepare your training and testing sample :

By setting the parameter

loadsampleoption = 1

you prepare the ML sample. In our analysis case, 1000 signal candidates will be taken from Monte-Carlo simulations and 1000 background candidates will be taken from data in a region of mass where no signal is present (called side-band regions).

Prepare your training and testing sample :

By activating one (or more) of these bit you will activate different types of algorithms.

activate_scikit = 1
activate_xgboost = 1
activate_keras = 0

For a first look, we suggest to use XGBoost algorithms, which is the fastest.

Do the training and the testing:

By activating the two bits you will tell the script to run the training and the testing of your algorithms to identify the best parameters of your algorithms. In the training step, the trained models will be saved locally. In the testing step, a dataframe and a new TTree with the probabilities obtained for each algorithm for each candidate will be saved.

dotraining = 1
dotesting = 1

Validation tools:

A long list of validation utilities, which includes score cross validation, ROC curves, learning curves, feature importance can be activated using the following bits:

docrossvalidation = 1
dolearningcurve = 1
doROC = 1
doboundary = 1
doimportance = 1

Home

Prerequisites

Installation

Package description and usage

DocumentationCuts

Exercise-on-aliceml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example

Perform a heavy-ion optimisation!

Get some input data!

Run the optimisation

Choose your classification problem

Choose your classification problem

Choose the number of signal and background candidates you want to use for the optimisation:

Prepare your training and testing sample :

Prepare your training and testing sample :

Do the training and the testing:

Validation tools:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally