Signal Recovery in the Presence of Background: Multi-dimensional Likelihood vs. sWeight Reconstruction

Author: Jacob Tutt, Department of Physics, University of Cambridge

Description

This repository compares the statistical power and performance of a multidimensional Extended Maximum Likelihood Estimate (MLE) with an ‘sWeighted’ fit, which isolates the Signal distribution in the control variable using fits from the independent variables. It contains the package, its documentation, and implementation required for the analysis.

This repository forms part of the submission for the MPhil in Data Intensive Science's S1 Statistics Course at the University of Cambridge.

Pipelines

This example is built upon four fundamental probability distributions, which are implemented as individual classes within the Base_Dist module. These distributions serve as the building blocks which functions and properties can be inherited by classes which combine them to describe more.

Base Probability Distributions

The base probability distributions are as follows:

Each of these distributions is encapsulated in its own class, providing methods for calculating probability density functions (PDFs), cumulative distribution functions (CDFs), and performing distribution fitting.

Compound Distributions

The compound distributions are two-dimensional (2D) probability distributions that combine properties of the base distributions. These are implemented as classes in the Compound_Dist module and inherit the behaviors of their constituent base distributions.

Signal Distribution

Represents the signal region in 2D space.

Constituents:
- Crystal Ball Distribution in the X-dimension.
- Exponential Decay Distribution in the Y-dimension.

Background Distribution

Represents the background noise in 2D space.

Constituents:
- Uniform Distribution in the X-dimension.
- Normal Distribution in the Y-dimension.

Overall Distribution

The total distribution is constructed from the Signal and Background distributions. This is implemented as a separate class in the Compound_Dist module and inherits the properties of the Signal and Background distributions, along with their respective base distributions.

Constituents:
- Signal Distribution
- Background Distribution

By using inheritance, the total distribution can integrate all its constituent distributions in a modular way and easily adaptable for different base distributions in other senarios.

Notebooks

The notebooks in this repository serve as walkthroughs for the analysis performed. They include derivations of the mathematical implementations, explanations of key choices made, and present the main results. Five notebooks are provided:

Notebook	Description
Notebook 1	Introduces and implements the four base probability distributions and their combination into signal and background components. Verifies proper normalisation over the truncated domain.
Notebook 2	Demonstrates the calculation and visualisation of marginal probability distributions in both X and Y, including how to implement it in the pipeline.
Notebook 3	Overview of the sampler (accept/reject algorithm) with automatic scaling and recovery of model parameters using Extended Unbineed Maxmimium Likelihood Fitting with iminuit.
Notebook 4	Performs a full bootstrap simulation study, including generation of samples and analysing trends in bias and uncertainty as functions of sample size.
Notebook 5	This explores the use of Sweights, an algorithm in which fits the marginal distribution in a marginalised axis using an Extended Likelihood fit, assigns statistical weights to events, and reconstructs the signal distribution in an indenpendent axis, removing all consideration of background distribution for this dimension

Documentation

Documentation on Read the Docs

The pipeline uses a modular, inherited class-based structure, which is explained below, to make it adaptable to different probability distributions. As a result documentation has been created for easier understanding of each functions methods and implementation:

Class and Function References: Includes detailed descriptions of all classes and functions used in the coursework.
Source Code Links: Direct links to the source code for easy review.
Notebook Integration: Hyperlinks throughout the notebooks provide direct access to relevant sections of the documentation.

Installation and Usage

To run the notebooks, please follow these steps:

1. Clone the Repository

Clone the repository from the remote repository to your local machine. Or your

git clone https://github.com/JacobTutt/stat_frequentist_analysis.git

2. Create a Fresh Virtual Environment

Use a clean virtual environment to avoid dependency conflicts.

python -m venv env
source env/bin/activate   # For macOS/Linux
env\Scripts\activate      # For Windows

3. Install the Package and Dependencies

Navigate to the repository’s root directory and install the package along with its dependencies:

pip install -e .

4. Set Up a Jupyter Notebook Kernel

To ensure the virtual environment is recognised within Jupyter notebooks, set up a kernel:

python -m ipykernel install --user --name=env --display-name "Statistical Analysis"

5. Run the Notebooks

Open the notebooks and select the created kernel (Statisical Analysis) to run the code.

For Assessment

The associated project report can be found under Project Report.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you have any questions, run into issues, or just want to discuss the project, feel free to:

Open an issue on the GitHub Issues page.
Reach out to me directly via email.

Author

This project is maintained by Jacob Tutt

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Report		Report
Stats_Analysis		Stats_Analysis
docs		docs
notebooks		notebooks
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
S1_Coursework.pdf		S1_Coursework.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Signal Recovery in the Presence of Background: Multi-dimensional Likelihood vs. sWeight Reconstruction

Author: Jacob Tutt, Department of Physics, University of Cambridge

Description

Table of Contents

Pipelines

Base Probability Distributions

Compound Distributions

Signal Distribution

Background Distribution

Overall Distribution

Notebooks

Documentation

Installation and Usage

1. Clone the Repository

2. Create a Fresh Virtual Environment

3. Install the Package and Dependencies

4. Set Up a Jupyter Notebook Kernel

5. Run the Notebooks

For Assessment

License

Support

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

JacobTutt/stat_frequentist_analysis

Folders and files

Latest commit

History

Repository files navigation

Signal Recovery in the Presence of Background: Multi-dimensional Likelihood vs. sWeight Reconstruction

Author: Jacob Tutt, Department of Physics, University of Cambridge

Description

Table of Contents

Pipelines

Notebooks

Documentation

Installation and Usage

1. Clone the Repository

2. Create a Fresh Virtual Environment

3. Install the Package and Dependencies

4. Set Up a Jupyter Notebook Kernel

5. Run the Notebooks

For Assessment

License

Support

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Uh oh!

Languages