Skip to content

Running the correction

saganatt edited this page Sep 14, 2021 · 14 revisions

Launching the program

Always first enter AliRoot, and then the virtual environment.

To run the correction software, inside the tpcwithdnn/ directory type:

python steer_analysis.py

If you see some warnings like:

cling::DynamicLibraryManager::loadLibrary(): libGpad.so.6.16: cannot open shared object file: No such file or directory

they can be safely ignored.

Stages of the algorithm

Full correction consists of 3 stages:

  • prediction of global distortion fluctuations with Boosted Decision Trees (XGBoost library)
  • correction of the distortion fluctuations predicted by BDT
  • prediction of residual (local) distortion fluctuations with U-Net - a deep convolutional neural network

Specific stages can be (de-)activated with active flag under each category in config_model_parameters.yml.

NOTE: What is called 'event' in the code and in the instruction is not a single collision, but a 'snapshot' - a set of measurements (3D maps) for a given time point. Such a single measurement reflects actually the overlapping of many collisions.

You can define which steps of the an analysis you want to run in default.yml. Multiple steps can be active:

  • dotrain - train model from zero
  • doapply - use the trained model to make predictions
  • doplot - create quality assurance plots of prediction results
  • dobayes - perform Bayesian optimization to find the best model configuration - currently implemented only for BDT
  • doprofile - compare models trained with different numbers of events

The remaining options are for the ND Validation.

Configuration

The parameters of the ML analysis can be configured in config_model_parameters.yml. Most often used arguments:

  • dirmodel, dirapply, dirplots - directories where the trained models, prediction results and plots should be saved (the paths can be relative)
  • dirinput_bias, dirinput_nobias - paths to directories where the biased / unbiased input datasets are stored
  • grid_phi, grid_r, grid_z - grid granularity, usually 90x17x17 or 180x33x33
  • z_range - only distortions with z_min <= z < z_max will be processed by the algorithm
  • opt_predout - the direction of distortions (r, rphi, z) to correct - currently only one direction can be processed at time
  • train_events, validation_events, apply_events - number of events for train / validation / apply, specified separately for BDT and NN. You can specify multiple numbers, but the lists of values for train / validation / apply should be of equal length. Then, the program will run for each triple. If doprofile is specified, the program will output plots with prediction results (mean, std dev., mean + std dev.) gathered for each triple.

Boosted Decision Trees

Currently, random forest (XGBRFRegressor) is used. The default configuration uses the approximate 'hist' tree method, the fastest available in XGBoost.

  • downsample - whether to use the downsampling
  • downsample_npoints - number of voxels to downsample
  • plot_train, train_npoints - whether to plot the learning curve and with how many points

The remaining parameters, under the params section come from the XGBoost Scikit-Learn API. Their meaning is described on the XGBoost page.

UNet

  • filters - number of channels (filters) in the first convolutional block (the 3rd dimension of a 3D convolution)
  • pooling - type of pooling function: max - max pooling, avg - average pooling, conv - not an actual pooling but 3D convolution
  • depth - depth of the network = number of convolutional blocks / levels
  • batch normalization - whether to use batch normalization
  • dropout - fraction of dropout
  • batch_size - size of a batch
  • shuffle - whether to shuffle
  • epochs - number of epochs
  • lossfun - loss function
  • metrics - metrics, values measured besides the loss function that do not affect training
  • adamlr - learning rate for Adam optimizer

ND validation

ND validation parameters are explained in Validation.

Command line

Some parameters are available for a quick setup on the command line. You can check them with:

python steer_analysis.py -h

Correction output

  • debug: on the console
  • models: dirmodel
    • XGBoost: JSON
    • U-Net: JSON, network weights: h5
  • predictions: dirval
    • a single ROOT file with histograms
  • prediction and profile plots: dirplot
  • indices of events in train / validation / apply partitions: dirmodel
    • the indices are picked up by ND validator if any of these partitions is chosen for the ND validation
  • dirmodel, dirval, dirplot are taken from config_model_parameters.yml

Clone this wiki locally