This GitHub open-source repository contains jupyter notebooks that feature machine learning examples associated with lectures at the Terascale Statistics School in Germany.
The notebooks in this package depend on several well-known Python modules, all well-engineered and free.
| modules | description |
|---|---|
| pandas | data table manipulation, often with data loaded from csv files |
| numpy | array manipulation and numerical analysis |
| matplotlib | a widely used plotting module for producing high quality plots |
| imageio | photo-quality image display module |
| scikit-learn | easy to use machine learning toolkit |
| pytorch | a powerful, flexible, machine learning toolkit |
| scipy | scientific computing |
| sympy | an excellent symbolic mathematics module |
| iminuit | an elegant wrapper around the venerable CERN minimizer Minuit |
| emcee | an MCMC module |
| tqdm | progress bar |
| joblib | module to save and load Python object |
| importlib | importing and re-importing modules |
The simplest way to install these Python modules is first to install miniconda (a slim version of Anaconda) on your laptop by following the instructions at:
https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
I recommend installing miniconda3, which comes pre-packaged with Python 3.
Software release systems such as Anaconda (conda for short) make it possible to have several separate self-consistent named environments on a single machine. For example, you may need to use Python 3.7.5 and an associated set of compatible Python modules and at other times you may need to use Python 3.9.13 with modules that require that particular version of Python. If you install software without using environments there is the danger that the software on your machine will eventually become inconsistent. Anaconda and its lightweight companion miniconda provide a way, for example, to have software environment on your machine that is consistent with Python 3.7.5 and another that is consistent with Python 3.9.13.
Of course, like anything humans make, miniconda3 is not perfect. There are times when the only solution is to remove an environment using
conda env remove -n <name>where <name> is the name of the environment and rebuild it by reinstalling the desired Python modules.
After installing miniconda3, it is a good idea to update conda using the command
conda update condaAssuming conda is properly installed and initialized on your machine (say, your laptop), you can create an environment, here called terascale
conda create --name terascaleand activate it using the command
conda activate terascaleThe environment need be created only once, but you must activate it whenever you create a new terminal window.
With the environment activated, you can now install root, python, numpy, etc. For example, the following command installs the ROOT package from CERN
conda install –c conda-forge root
If all goes well, this will install a recent version of the ROOT as well as a compatible version of Python and several Python modules including numpy.
Now install pytorch, matplotlib, scikit-learn, etc.
conda install –c conda-forge pytorch
conda install –c conda-forge matplotlib
conda install –c conda-forge scikit-learn
conda install –c conda-forge pandas
conda install –c conda-forge sympy
conda install –c conda-forge imageio
conda install –c conda-forge jupyterThe command git is needed to download the Terascale package from GitHub. If git is not on your machine, it can be installed using the command
conda install –c conda-forge gitTo install Terascale do
cd
mkdir tutorials
cd tutorials
git clone https://github.com/hbprosper/TerascaleIn the above, the package Terascale has been downloaded into a directory called tutorials.
Open a new terminal window, navigate to the directory containing Terascale and run the jupyter notebook in that window (in blocking mode, that is, without "&" at the end of the command)
jupyter notebookIf all goes well, the jupyter notebook will appear in your default web browser and the terminal window will be blocked. In your browser, navigate to the Terascale directory and under the Files menu item, click on the notebook test.ipynb and execute it. This notebook tries to import several Python modules. If it does so without complaints, you are ready to try the other notebooks.
| notebook | description |
|---|---|
| test.ipynb | Test import ofrequired Python moduels |
| hzz4l_sklearn | Boosted Decision Trees (BDT) with AdaBoost: classification of Higgs boson events |
| hzz4l_pytorch | Deep Neural Network (DNN): classification of Higgs boson events |
| sdss_autoencoder | Autoencoder: map SDSS galaxy/quasar data to 1D |
| mnist_cnn | Convolutional Neural Network (CNN): classification of MNIST digits |
| taylor_series_transformer | Transformer Neural Network (TNN): example of symbolic Taylor series expansion |