Skip to content

In depth analysis of the performance of a multidimensional Extended Maximum Likelihood Estimate (MLE) with Custom Orthogonal Weight functions (‘sWeighted’) fit

License

Notifications You must be signed in to change notification settings

JacobTutt/stat_frequentist_analysis

Repository files navigation

Signal Recovery in the Presence of Background: Multi-dimensional Likelihood vs. sWeight Reconstruction

Author: Jacob Tutt, Department of Physics, University of Cambridge

License Documentation Status

Description

This repository compares the statistical power and performance of a multidimensional Extended Maximum Likelihood Estimate (MLE) with an ‘sWeighted’ fit, which isolates the Signal distribution in the control variable using fits from the independent variables. It contains the package, its documentation, and implementation required for the analysis.

This repository forms part of the submission for the MPhil in Data Intensive Science's S1 Statistics Course at the University of Cambridge.

Table of Contents

Pipelines

This example is built upon four fundamental probability distributions, which are implemented as individual classes within the Base_Dist module. These distributions serve as the building blocks which functions and properties can be inherited by classes which combine them to describe more.

The base probability distributions are as follows:

Each of these distributions is encapsulated in its own class, providing methods for calculating probability density functions (PDFs), cumulative distribution functions (CDFs), and performing distribution fitting.


The compound distributions are two-dimensional (2D) probability distributions that combine properties of the base distributions. These are implemented as classes in the Compound_Dist module and inherit the behaviors of their constituent base distributions.

Represents the signal region in 2D space.

  • Constituents:
    • Crystal Ball Distribution in the X-dimension.
    • Exponential Decay Distribution in the Y-dimension.

Represents the background noise in 2D space.

  • Constituents:
    • Uniform Distribution in the X-dimension.
    • Normal Distribution in the Y-dimension.

The total distribution is constructed from the Signal and Background distributions. This is implemented as a separate class in the Compound_Dist module and inherits the properties of the Signal and Background distributions, along with their respective base distributions.

  • Constituents:
    • Signal Distribution
    • Background Distribution

By using inheritance, the total distribution can integrate all its constituent distributions in a modular way and easily adaptable for different base distributions in other senarios.

Notebooks

The notebooks in this repository serve as walkthroughs for the analysis performed. They include derivations of the mathematical implementations, explanations of key choices made, and present the main results. Five notebooks are provided:

Notebook Description
Notebook 1 Introduces and implements the four base probability distributions and their combination into signal and background components. Verifies proper normalisation over the truncated domain.
Notebook 2 Demonstrates the calculation and visualisation of marginal probability distributions in both X and Y, including how to implement it in the pipeline.
Notebook 3 Overview of the sampler (accept/reject algorithm) with automatic scaling and recovery of model parameters using Extended Unbineed Maxmimium Likelihood Fitting with iminuit.
Notebook 4 Performs a full bootstrap simulation study, including generation of samples and analysing trends in bias and uncertainty as functions of sample size.
Notebook 5 This explores the use of Sweights, an algorithm in which fits the marginal distribution in a marginalised axis using an Extended Likelihood fit, assigns statistical weights to events, and reconstructs the signal distribution in an indenpendent axis, removing all consideration of background distribution for this dimension

Documentation

Documentation on Read the Docs

The pipeline uses a modular, inherited class-based structure, which is explained below, to make it adaptable to different probability distributions. As a result documentation has been created for easier understanding of each functions methods and implementation:

  • Class and Function References: Includes detailed descriptions of all classes and functions used in the coursework.
  • Source Code Links: Direct links to the source code for easy review.
  • Notebook Integration: Hyperlinks throughout the notebooks provide direct access to relevant sections of the documentation.

Installation and Usage

To run the notebooks, please follow these steps:

1. Clone the Repository

Clone the repository from the remote repository to your local machine. Or your

git clone https://github.com/JacobTutt/stat_frequentist_analysis.git

2. Create a Fresh Virtual Environment

Use a clean virtual environment to avoid dependency conflicts.

python -m venv env
source env/bin/activate   # For macOS/Linux
env\Scripts\activate      # For Windows

3. Install the Package and Dependencies

Navigate to the repository’s root directory and install the package along with its dependencies:

pip install -e .

4. Set Up a Jupyter Notebook Kernel

To ensure the virtual environment is recognised within Jupyter notebooks, set up a kernel:

python -m ipykernel install --user --name=env --display-name "Statistical Analysis"

5. Run the Notebooks

Open the notebooks and select the created kernel (Statisical Analysis) to run the code.

For Assessment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you have any questions, run into issues, or just want to discuss the project, feel free to:

Author

This project is maintained by Jacob Tutt

About

In depth analysis of the performance of a multidimensional Extended Maximum Likelihood Estimate (MLE) with Custom Orthogonal Weight functions (‘sWeighted’) fit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors