Hello, thank you for this great work.
I am currently trying to train a network for multi-task regression using BMCLossMD. However, the distributions of the target variables are different. For example, one of them looks exponential, another one resembles gaussian and has 4x larger std then the exponential one. Should I then scale the variables to at least one std? The noise_var in BMCLossMD is 1D, so I assume yes but have no proof...