Here are the notebooks with data processing and analysis using the benchtools package.
The notebooks are named following these rules:
- The first number refers to the order they were created.
- The second number is the version of the notebook. This will be increased only when new code is added.
- The name, separated by "-", has a general idea of the content.
Here is a brief description for what you will find in each one:
-
📚 00.0-pre-processing-BB1 and 00.0-pre-processing-RD: First exploration about clustering using black box 1 (BB1) and R&D dataset.
-
📚 01.0-data-exploration: Data analysis before and after clustering the jets. Calculations and plots of the variable's distributions and some correlations.
-
📚 02.0-all-distribution-correlation-plots: Plots for the distribution of all the variables and the correlations between them, separated as background and signal.
-
📚 03.0-decision-tree: First use of a ML algorithm with the clustered data. A simple decision tree.
-
📚 04.0-comparison-supervised-algorithms: Classification using multiple supervised algorithms from sklearn. First notebook with the calculation of performance metrics and plots.
-
📚 05.0-GBC-classification: Classification using Gradient Boosting Classifier (GBC). Calculation of performance metrics and comparison of the real distribution of the variables with the distributions obtained with the classifier.
-
📚 06.0-GBC-overfitting: An overfitting review for the classification made with GBC.
-
📚 07.0-tensorflow-classificator: Use of a simple sequential classifier made with tensorflow. Calculation of performance metrics and comparison between the real distribution of the variables and the distributions obtained with the classifier.
-
📚 08.0-compararison-unsupervised-algorithms: Classification using multiple unsupervised algorithms from sklearn. Calculation of performance metrics and plots.
-
📚 09.0-Kmeans-classification: Classification using K-Means classifiier. Calculation of performance metrics and comparison between the real distribution of the variables and the distributions obtained with the classifier.
-
📚 10.0-UCluster-data: Exploring pre-processed data from UCluster and the classifications made with this algorithm.
-
📚 11.0-GAN-AE-data: Exploring pre-processed data from GAN-AE and the classification made with this algorithm.
-
📚 13.0-scalers-comparison: Comparison of the classification using different scalers: MinMaxScaler, StandardScaler and RobustScaler.
-
📚 14.0-dimension-reduction-comparison: Comparison of the classification using different dimensionality reduction techniques: PCA, SCD, LDA.
-
📚 15.0-pipeline-exploration: Code writting for the pipeline. Exploration of options.
-
📚 16.0-pipeline-KMeans-performance: Issue with KMeans classification, solved by choosing the background label to the majoritary class predicted.