-
Notifications
You must be signed in to change notification settings - Fork 1
CMS‐OpenData‐Pipeline
CMS Open Data has immense potential for the researchers outside the collaboration. They can exploit the scientific potential of these data and also test out novel ideas with an actual detector setup. CMS Open Data includes collision as well as their simulated datasets, which allows researchers to perform different analyses and even search for new physics. The collaboration can also test out novel ideas and improve the current methods of particle detection and reconstruction.
Majority of primary datasets of CMS Open Data are in the AOD or miniAOD format. AOD has an unique data structure which stores a lot of information from proton-proton collisions, some of which might not be required for a specific physics analyses. miniAOD has a simpler structure than AOD but still contains a lot of high-level information. This makes it difficult for someone outside the collaboration to work with the AOD or the miniAOD sturcture and perform analyses. AODs retain a large amount of low level information necessary for development of algorithms. Typically in analysis level nanoAODs are preferred.
Our pipeline aims to deliver a much simpler format, the FunAOD, which is similar to nanoAOD in structure, a flat Ntuple, but has even lesser information due to it's focus on optimizing size and analysis speed. It only contains the necessary observables for higher level physics objects. But if one wishes, they can add more information by modifying the EDAnalyzer. As the name suggests, our aim is to make particle physics fun.
- The pipeline consists of a AOD for Run 1 and miniAOD for Run 2 as an input file.
- An EDAnalyzer module is written to write out all the necessary information required for the physics analysis to a flat Ntuple file called FunAOD that can be used for further physics analysis.
- A CMS configuration file loads all the required modules necessary for producing the FunAOD.
To extract information from the AOD data format, a module called EDAnalyzer is written that allow read-only acecss to the Event. It is written in C++. The following information is then writes the following information to the FunAOD:
- Particle Flow Candidates
- Reconstructed Jets
- Reconstructed Muons
- Reconstructed Electrons
- Reconstructed Photons
- Generator-Level Particles (only AODSIM)
- Generator-Level Jets (only AODSIM)
To extract information from the miniAOD data format, a EDAnalyzer is written that allow read-only acecss to the Event. It is written in C++. The following information is then writes the following information to the FunAOD:
- Reconstructed Jets
- Reconstructed Muons
- Reconstructed Electrons
- Reconstructed Photons
- Generator-Level Particles (only miniAODSIM)
- Generator-Level Jets (only miniAODSIM)
The CMS configuration files define the EDAnalyzer modules to be loaded, the configurable parameters and the order the events will run. The configuration files are written in Python.
In this particular pipeline the CMS Configuration file defines the following:
- Input AOD/AODSIM or miniAOD/miniAODSIM file to be accessed in the CMS open database.
- Detector alignment and caliberation (AlCa) modules.
- Number of events to analyze.
- Whether it is a collision or simulated dataset.
- The center of mass energy of collision (7 TeV or 8 TeV - only for AOD2FunAOD).
- Output FunAOD file.
The configuration files are executed using cmsRun, a CMSSW executable that loads all the required modules during runtime.
The output file produced by the pipeline is a flat Ntuple called FunAOD which is a ROOT flat tree. A flat tree is a structure that can be read by standalone ROOT, without requiring any additional libraries. This particular flat tree stores event level information of reconstructed as well as generator level particles. This FunAOD can be directly used for analysis of physics objects. This particular flat tree contains leaves which stores all the properties of the particles extracted from AOD or miniAOD. The FunAOD has a structure similar to the nanoAOD format, but is much simpler in structure. The average size of the FunAOD is 0.5 - 1.5 GB (depending upon the number of events).