Skip to content

Running the correction

saganatt edited this page Sep 14, 2021 · 14 revisions

Launching the program

To run the correction software, inside the tpcwithdnn/ directory type:

python steer_analysis.py

If you see some warnings like:

cling::DynamicLibraryManager::loadLibrary(): libGpad.so.6.16: cannot open shared object file: No such file or directory

they can be safely ignored.

Stages of the algorithm

Full correction consists of 3 stages:

  • prediction of global distortion fluctuations with Boosted Decision Trees (XGBoost library)
  • correction of the distortion fluctuations predicted by BDT
  • prediction of residual (local) distortion fluctuations with U-Net - a deep convolutional neural network

Specific stages can be (de-)activated with active flag under each category in config_model_parameters.yml.

NOTE: What is called 'event' in the code and in the instruction is not a single collision, but a 'snapshot' - a set of measurements (3D maps) for a given time point. Such a single measurement reflects actually the overlapping of many collisions.

You can define which steps of the an analysis you want to run in default.yml. Multiple steps can be active:

  • dotrain - train model from zero
  • doapply - use the trained model to make predictions
  • doplot - create quality assurance plots of prediction results
  • dobayes - perform Bayesian optimization to find the best model configuration - currently implemented only for BDT
  • doprofile - compare models trained with different numbers of events

The remaining options are for the ND Validation.

Configuration

The parameters of the ML analysis can be configured in config_model_parameters.yml. Most often used arguments:

  • dirmodel, dirapply, dirplots - directories where the trained models, prediction results and plots should be saved (the paths can be relative)
  • dirinput_bias, dirinput_nobias - paths to directories where the biased / unbiased input datasets are stored
  • grid_phi, grid_r, grid_z - grid granularity, usually 90x17x17 or 180x33x33
  • z_range - only distortions with z_min <= z < z_max will be processed by the algorithm
  • opt_predout - the direction of distortions (r, rphi, z) to correct - currently only one direction can be processed at time
  • train_events, validation_events, apply_events - number of events for train / validation / apply, specified separately for BDT and NN. You can specify multiple numbers, but the lists of values for train / validation / apply should be of equal length. Then, the program will run for each triple. If doprofile is specified, the program will output plots with prediction results (mean, std dev., mean + std dev.) gathered for each triple.

Boosted Decision Trees

Currently, random forest (XGBRFRegressor) is used. The default configuration uses the approximate 'hist' tree method, the fastest available in XGBoost.

  • downsample - whether to use the downsampling
  • downsample_npoints - number of voxels to downsample
  • plot_train, train_npoints - whether to plot the learning curve and with how many points

The remaining parameters, under the params section come from the XGBoost Scikit-Learn API. Their meaning is described on the XGBoost page.

UNet

  • filters - number of channels (filters) in the first convolutional block (the 3rd dimension of a 3D convolution)
  • pooling - type of pooling function: max - max pooling, avg - average pooling, conv - not an actual pooling but 3D convolution
  • depth - depth of the network = number of convolutional blocks / levels
  • batch normalization - whether to use batch normalization
  • dropout - fraction of dropout
  • batch_size - size of a batch
  • shuffle - whether to shuffle
  • epochs - number of epochs
  • lossfun - loss function
  • metrics - metrics, values measured besides the loss function that do not affect training
  • adamlr - learning rate for Adam optimizer

ND validation

ND validation parameters are explained in Validation.

Command line

Some parameters are available for a quick setup on the command line. You can check them with:

python steer_analysis.py -h

Program output

The debug output is by default written to the console.

After the training, the models are saved in the dirmodel directory. XGBoost models are stored in JSON format and U-Net is saved in JSON with weights in h5. dirmodel contains also indices of events in train / validation / apply partitions, to be picked up by ND validator if any of these partitions is chosen for ND validation.

The prediction results are saved in the dirval directory, in a single ROOT file with histograms. These histograms are then plotted and saved in PDF files in the dirplot directory, if doplot option is specified in default.yml. dirplots contains also results of the profiling with doprofile.

Clone this wiki locally