-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Following up on the discussion that took place during the kick-off meeting, an interesting point arose on the issue of which methods are best used to approximate a set of binned counts.
In the case where one has a neural network-based observable, one could apply the softmax function exp(nn / T) / sum(exp(nn / T)) ( T = temperature hyperparam) to the nn output, like one would do with a classification problem. This essentially puts every event in all bins, but weights each bin with the corresponding nn output (normalized to 1). The histogram is then the elementwise sum of the softmaxed output over all events.
An alternative to this is to take an approach using a kernel density estimate (kde), which is defined by instantiating a distribution (kernel) centered around each data point (e.g. standard normal distribution) and averaging their contributions. The smoothness of the resulting estimate is controlled with a 'bandwidth' hyperparameter, which scales the widths of the kernels. Once you choose a bandwidth, you can get 'counts' by integrating the pdf over a set of intervals (bins). One can equivalently use the cumulative density function.
Revisiting the case with the nn-based observable, one could then make the network output one value (regression net), and then define a kde over the set of outputs from a batch of events. The question is then: is there any clear advantage in expressivity or principle from using a kde, softmax, or other methods?
Another interesting question is where these approaches break down, e.g. using a kde in this way seems to be very senstitive to the bandwidth.
It's also worth noting that a kde may make more sense for general use as a candidate for smooth histogramming, but the case of the nn observable may be more nuanced.