Some more fixups

Nick Manganelli · Nick Manganelli · commit 1e70b207a6db · 2024-11-20T19:57:37.000+01:00
diff --git a/L1Trigger/L1TTrackMatch/Readme.md b/L1Trigger/L1TTrackMatch/Readme.md
@@ -11,15 +11,15 @@ LibHLS $\eta$ [module](https://gitlab.cern.ch/GTT/LibHLS/-/tree/master/Modules/C
 CMSSW [plugin](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/L1TTrackMatch/plugins/L1GTTInputProducer.cc)
 
 #### Status (October 2024)
-The Input conversion in Firmware centrally stores conversion constants and consistently uses the central GTT format (struct) for tracks. Meanwhile, the emulator is in a somewhat different spot, with a low-granularity conversion of $p_T$ with 7 integer bits and 3 float bits (currently consistent with firmware). For reading the $p_T$ back, different algorithms make different assumptions about the width of the datatype, either the actual ap_fixed<10,7> with 7 integer bits, or ap_fixed<14,9>. This works because of a bit-shift that 0-pads the 2 least significant bits of the actual ap_fixed<10,7> data stored in the previoius canonical L1T $\frac{1}{R}$ word; ergo the least significant integer bits are aligned under either interpretation. The $p_T$ conversion also uses an 8-bit LUT to map track values from the ap_int datatype to ap_fixed, which means that the least significant bits are ignored, dropping 2**2 in resolution. Because of the non-linear conversion, this produces strong discretization of high $p_T$ tracks. A possible remediation of this would involve either splitting the LUT so that a more granular version is applied to high $p_T$ tracks, or equivalently residual/correction-LUTs to adjust the response where needed. Either approach should require a modest increase in LUT resources but improve the conversion loss. Additionally, studies on $\phi$ mesons and reconstructing $B_s$ from them indicates that the $\eta$ conversion is a significant source of error, but also the $p_T$ appears to have biases (which may have a charge dependence). The latter indicates that there may be an inconsistent use of half-bin shifting in calculations somewhere. The $\eta$ conversion uses a 128-sized LUT (owing to symmetry in the transform, it's cut in half relative to a simple calculation), into an ap_fixed<8,3> datatype; the LUT should be increased by a factor of 2^3 at least to ap_fixed<11,3>, improving the granularity significantly for $\phi$ meson reconstruction.
+The Input conversion in Firmware centrally stores conversion constants and consistently uses the central GTT format (struct) for tracks. Meanwhile, the emulator is in a somewhat different spot, with a low-granularity conversion of $p_T$ with 7 integer bits and 3 float bits (currently consistent with firmware). For reading the $p_T$ back, different algorithms make different assumptions about the width of the datatype, either the actual ap_fixed<10,7> with 7 integer bits, or ap_fixed<14,9>. This works because of a bit-shift that 0-pads the 2 least significant bits of the actual ap_fixed<10,7> data stored in the previoius canonical L1T $\frac{1}{R}$ word; ergo the least significant integer bits are aligned under either interpretation. The $p_T$ conversion also uses an 10-bit LUT to map track values from the ap_int datatype to ap_fixed, which means that the least significant bits are ignored, dropping 2**2 in resolution. Because of the non-linear conversion, this produces strong discretization of high $p_T$ tracks. A possible remediation of this would involve either splitting the LUT so that a more granular version is applied to high $p_T$ tracks, or equivalently residual/correction-LUTs to adjust the response where needed. Either approach should require a modest increase in LUT resources but improve the conversion loss. Additionally, studies on $\phi$ mesons and reconstructing $B_s$ from them indicates that the $\eta$ conversion is a significant source of error, but also the $p_T$ appears to have biases (which may have a charge dependence). The latter indicates that there may be an inconsistent use of half-bin shifting in calculations somewhere. The $\eta$ conversion uses a 128-sized LUT (7 bits, owing to symmetry in the transform, it's cut in half relative to a simple calculation), into an ap_fixed<8,3> datatype; the LUT should be increased by a factor of 2^3 at least to ap_fixed<11,3>, improving the granularity significantly for $\phi$ meson reconstruction.
 
 ### Vertex Finder (VF)
 #### FastHisto (FH) - Baseline Algorithm
 This version of the algorithm serves as the baseline from TDR studies. This uses a histogram (256 bins as of 2023) which is filled with track $p_T$ as weights. A single vertex is chosen by a sliding-window algorithm which finds the consecutive bins (currently 3) containing the maximum $p_T$-sum (using a flat kernel). An inversion LUT which is built to store the mapping from $p_T$ to $\frac{1}{p_T}$ is used in combination with a window-bin-indexed weighted-sum of per-bin $p_T$-sums is used to calculate the $p_T$-weighted location of the peak, and this is stored as the vertex $z_0$ position. The $p_T$-sum from the window is denoted as the sumPt of the vertex.
 LibHLS [module](https://gitlab.cern.ch/GTT/LibHLS/-/tree/master/Modules/Vertex?ref_type=heads)
 VertexFinder class [interface](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/VertexFinder/interface/VertexFinder.h), [src](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/VertexFinder/src/VertexFinder.cc), and [plugin](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/VertexFinder/plugins/VertexProducer.cc) with CMSSW default configuration [here](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/VertexFinder/python/l1tVertexProducer_cfi.py). The Simulation tag is 'l1tVertexFinder' whereas the emulator is usually grabbed via 'l1tVertexFinderEmulator.'
 #### End-to-end Neural Network (E2E) - Extended Algorithm
-The End-to-end Neural Network is a beyond-baseline version of the algorithm. It uses the same structure as FastHisto, but replaces weighting tracks by their $p_T$ with a DNN discriminant score. This produces an improved vertex-finding efficiency and resolution over FH. This is trained in tandem with TrackAssociationNetworks to replace the cut-based TrackSelection algorithm typically paired with FH.
+The End-to-end Neural Network is a beyond-baseline version of the algorithm. It uses the same structure as FastHisto, but replaces weighting tracks by their $p_T$ with a DNN discriminant score. This produces an improved vertex-finding efficiency and resolution over FH. This is trained in tandem with TrackAssociationNetworks to replace the cut-based TrackVertexAssociation algorithm typically paired with FH.
 
 #### Status as of October 2024
 The FastHisto (Emulation) algorithm has bit-level agreement with the firmware using several thousand events from $t\bar{t}$ simulation (200 PileUp). The algorithm can handle mulitple vertices. In LibHLS/firmware, only one vertex is enabled to reduce resource usage. An implementation detail is that the $p_T$ sum per bin is calculated untruncated in both firmware and emulation, but prior to the vertex-finding portion (sliding window algo), the precision is reduced. Of the fields proposed for the vertex, only the valid bit, the sumPt, and the $z_0$ position are filled; all other fields, including the quality field, the nTracks in/out PV are 0-filled. A design shortcoming is that the number of bins is a compile-time constant, but runtime-configurable constants passed into the configuration as `FH_HistogramParameters` indicate the minimum and maximum $z$ position for the histogram, and the 3rd parameter must divide this range appropriately to match the number of bins in firmware. For example, `FH_HistogramParameters = cms.vdouble(-20.46912512, 20.46912512, 0.15991504)` is appropriate for 256 bins as currently used.
@@ -72,7 +72,7 @@ LibHLS [module](https://gitlab.cern.ch/GTT/LibHLS/-/tree/master/Modules/WtoThree
 [reader](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/plugins/GTTFileReader.cc)
 [writer](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/plugins/GTTFileWriter.cc)
 These plugins make use of the subsystem-agnostic BoardDataReader and BoardDataWriter classes, respectively, specified in the reader's [interface](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/interface/BoardDataReader.h) and [src](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/src/BoardDataReader.cc) files, and likewise for the writer [interface](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/interface/BoardDataWriter.h) and [src](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/src/BoardDataWriter.cc)
-The GTT configuration constants are mostly gathered in the [GTTInterface.h](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/interface/GTTInterface.h) which specifies how link inputs/frames map to given collections of objects for pattern-buffer writing and reading. This specified, for example, whether vertices are placed in link 3 or link 0 to GT for a given board, and at which starting frame they may appear. Similarly other collections like jets, displaced jets, MET, HT link and frame position encoding are specified in here.
+The GTT configuration constants are mostly gathered in the [GTTInterface.h](https://github.com/cms-sw/cmssw/blob/master/L1Trigger/DemonstratorTools/interface/GTTInterface.h) which specifies how link inputs/frames map to given collections of objects for pattern-buffer writing and reading. For example, whether vertices are placed in link 3 or link 0 to GT for a given board, and at which starting frame they may appear, is encoded in these maps. Similarly other collections like jets, displaced jets, MET, HT link and frame position encoding are specified in this file.
 
 ## Emulation Pattern (Buffer) Files
 
@@ -88,12 +88,12 @@ cmsRun createFirmwareInputFiles_cfg.py maxEvents=1008 format=APx inputFiles=L1Tr
 An example configuration as referred to via the `inputFiles` kwarg can be found below, in this README.
 
 ### Loading tracks and vertices from pattern files
-Two input options, `tracks` and `vertices` can be passed 3 options, `donotload`, `load`, and `overwrite` to modify the behavior of patter-file writing. For each collection, the option `donotload` implies that the appropriate collection will be generated or read in from CMSSW collections. The `load` option instead dictates that pattern files' contents, specified in the inputFiles configuration, will be decoded (verifying they can be read back via BoardDataReader class, as configured via GTTFileReader), but the final collections will still be read/generated from CMSSW collections generated upstream in TrackFinding. The final option, `overwrite,` decodes the objects from pattern files and makes these the corresponding collection used in subsequent steps. That is, if `tracks=overwrite` is used, then the tracks passed into the GTTInputConversion and downstream TS, VF, etc. will come from pattern files written in the appropriate `readerformat` (APx, EMPv2, ...). As long as vertices is not set to `overwrite` as well, this will calculate new vertices from these pattern-buffer loaded tracks instead of those from the source root files or generated by the TrackFinder upstream. Similarly, if `vertices=overwrite,` the vertices collection will be read from the pattern files and upstream tracks (regardless of source and `tracks=X` option selected) will be ignored.
+Two input kwargs, `tracks` and `vertices` can be passed 3 options each, `donotload`, `load`, and `overwrite` to modify the behavior of patter-file writing. For each collection, the option `donotload` implies that the appropriate collection will be generated or read in from CMSSW collections (this is the default mode). The `load` option instead dictates that pattern files' contents, specified in the inputFiles configuration python file, will be decoded (verifying they can be read back via BoardDataReader class, as configured via GTTFileReader), but then ignored: the final collections will still be read/generated from CMSSW collections generated upstream in TrackFinding, as in the default mode. The final option, `overwrite,` decodes the objects from pattern files and makes these the corresponding collection used in subsequent steps. That is, if `tracks=overwrite` is used, then the tracks passed into the GTTInputConversion and downstream TS, VF, JF, etc. will come from pattern files written in the appropriate `readerformat` (APx, EMPv2, ...). As long as vertices is not set to `overwrite` as well, this will calculate new vertices from these pattern-buffer loaded tracks instead of those from the source root files or generated by the TrackFinder upstream. Similarly, if `vertices=overwrite,` the vertices collection will be read from the pattern files and upstream tracks (regardless of source and `tracks=X` option selected) will be ignored. The readerformat should specify whether the inputs are `APx` or `EMPv2` encoded.
 ```
 cmsRun createFirmwareInputFiles_cfg.py maxEvents=1008 format=APx inputFiles=L1Trigger/DemonstratorTools/python/TT_TuneCP5_14TeV-powheg-pythia8_Phase2Spring23DIGIRECOMiniAOD-PU200_L1TFix_Trk1GeV_131X_mcRun4_realistic_v9-v1_GEN-SIM-DIGI-RAW-MINIAOD_APxBuffers_cff.py tracks=overwrite vertices=donotload readerformat=APx
 ```
 
-An example of the `inputFiles` configuration including pointers to buffer files follows:
+An example of the `inputFiles` configuration including pointers to buffer files follows (the pointers to pattern files can be dropped for workflows that only need to generate buffer files from the default CMSSW collections or upstream TrackFinder):
 
 ```
 import FWCore.ParameterSet.Config as cms