datasets excluded from NNPDF4.0: CMS_2JET_8TEV_3D#2418
datasets excluded from NNPDF4.0: CMS_2JET_8TEV_3D#2418
Conversation
b7f065e to
9805191
Compare
achiefa
left a comment
There was a problem hiding this comment.
Below a few comments for this PR.
| - observable_name: 3D | ||
| process_type: DIJET | ||
| tables: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] | ||
| ndata: [31, 26, 14, 23, 17, 11] # listed per (y*, yb) bin, sum 122 |
There was a problem hiding this comment.
Validphys cannot parse a list for the ndata entry. Please, change it to the total number of points
| ndata: [31, 26, 14, 23, 17, 11] # listed per (y*, yb) bin, sum 122 | |
| ndata: 122 |
|
|
||
| implemented_observables: | ||
| - observable_name: 3D | ||
| process_type: DIJET |
There was a problem hiding this comment.
Apparently @enocera has already implemented the process for the 3D dijet distribution in the branch CMS_2JET_13TEV, that has not been merged yet. Try to use the same.
| kinematics: | ||
| file: kinematics.yaml | ||
| variables: # need to fix inline math here because load_yaml breaks down | ||
| pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"} |
There was a problem hiding this comment.
Use available kinematic variables. If the kinematics that you need is missing, you need to implement it. See here the available kinematics. Again, maybe @enocera has implemented new variables in his branch.
nnpdf/nnpdf_data/nnpdf_data/process_options.py
Lines 14 to 49 in af215dd
| variables: # need to fix inline math here because load_yaml breaks down | ||
| pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"} | ||
| ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""} | ||
| yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""} |
There was a problem hiding this comment.
Same here: yboost does not exist in process_options.py.
| pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"} | ||
| ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""} | ||
| yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""} | ||
| sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"} |
There was a problem hiding this comment.
Remove sqrts from the kinematics. It's a reduntant information that is already incorporated in the name of the dataset. No need to keep it.
| sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"} |
| ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""} | ||
| yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""} | ||
| sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"} | ||
| kinematic_coverage: [pTavg, ystar, yboost, sqrts] |
There was a problem hiding this comment.
kinematic_coverage can only handle three variables. Remove sqrt
| kinematic_coverage: [pTavg, ystar, yboost, sqrts] | |
| kinematic_coverage: [pTavg, ystar, yboost] |
| bins: | ||
| - pTavg: | ||
| min: 133.0 | ||
| mid: 143.0 | ||
| max: 153.0 | ||
| yboost: | ||
| min: 0.0 | ||
| mid: 0.5 | ||
| max: 1.0 | ||
| ystar: | ||
| min: 0.0 | ||
| mid: 0.5 | ||
| max: 1.0 | ||
| sqrts: | ||
| min: null | ||
| mid: 8 | ||
| max: null |
There was a problem hiding this comment.
There is not prescription for the ordering of the kinematic variables, as long as the ordering matches with kinematic_coverage in metadata.yaml. However, pT is usually in the last position, see here
If you change this, make sure you also change the ordering in kinematic_coverage.
| bins: | |
| - pTavg: | |
| min: 133.0 | |
| mid: 143.0 | |
| max: 153.0 | |
| yboost: | |
| min: 0.0 | |
| mid: 0.5 | |
| max: 1.0 | |
| ystar: | |
| min: 0.0 | |
| mid: 0.5 | |
| max: 1.0 | |
| sqrts: | |
| min: null | |
| mid: 8 | |
| max: null | |
| bins: | |
| yboost: | |
| min: 0.0 | |
| mid: 0.5 | |
| max: 1.0 | |
| ystar: | |
| min: 0.0 | |
| mid: 0.5 | |
| max: 1.0 | |
| pTavg: | |
| min: 133.0 | |
| mid: 143.0 | |
| max: 153.0 |
78069ab to
e330dd7
Compare
af142f5 to
c799acb
Compare
| - CMS_2JET_8TEV_3D_3 | ||
| - CMS_2JET_8TEV_3D_4 | ||
| - CMS_2JET_8TEV_3D_5 | ||
| operation: 'null' # from TeV to GeV |
There was a problem hiding this comment.
Please remove the comment because it's not pertinent to the operation key
|
Hi @andrpie , what's the status of this? Cam you post data-theory comparisons here so that wa keep track for future records? Also, mind that the branch has to be rebased. Note that I'm about to merge onto master the PR Emanuele has worked on. This implements 3D kinematics for dijets. So it's likely that there will be conflicts. |
|
Hi @achiefa,
I believe my treatment of the uncertainties (especially) systematics needs to be reassessed. Also, the covmat is highly singular here. I will post data-theory in the next comment.
Indeed, I will resolve the conflicts. |
|
Thanks for this. However, you need to rebase on top of master to get the correct data-theory plots. The reason is that last week Emanuele and I realised that there was a bug in the implementation of the systematic shift, which made these plots wrong. Nonetheless, the chi2 was not affected by this bug. Given the value that you obtained, I'd say that the predictions are fairly good. Of course we need to see the correct comparisons that depict a clear picture. |
|
Also, I see that the uncertainties in the data are basically zero. I suspect that for this dataset experimentalists only provided the full covariance matrix with all the uncertainties correlated. If this is the case, the whole uncertainty structure of the data flows into the shift, leaving the data without uncertainty. As a sanity check, could you please plot the same comparison wihtout the shift? You can just set the flag |
|
As a reference, a similar problem has occurred here: https://github.com/NNPDF/theories_slim/pull/67 |
| version: 1 | ||
|
|
||
| nnpdf_metadata: | ||
| nnpdf31_process: "DIJET_3D" |
There was a problem hiding this comment.
theory uncertainties
| nnpdf31_process: "DIJET_3D" | |
| nnpdf31_process: "DIJET" |
|
|
||
| implemented_observables: | ||
| - observable_name: 3D | ||
| process_type: DIJET |
There was a problem hiding this comment.
| process_type: DIJET | |
| process_type: DIJET_3D |
|
Please @andrpie update the labels and upload the right plots (with the fixes to the shift) |
|
@andrpie to get the plots with the right shifts you need to rebase on top of master (or merge from master, although the rebase is much preferred) The rebase should be easy as the conflict should just the line in process_option that has been updated also from a different pull request. |
Crazy I missed it.... Co-authored-by: Amedeo Chiefa <103528316+achiefa@users.noreply.github.com>
without these edits validphys would produce incorrect covmat
fe0d052 to
308718a
Compare
|
Dear @scarlehoff , @achiefa. Below is the data-theory comparison after the rebase. What worries me the most is the treatment of statistical uncertainties. If I understand correctly, validphys recognises uncertainty as statistical if its name starts with "stat". For each datapoint, there are 122 artificial statistical uncertainties that come from correlation matrices. As @achiefa pointed out, the covmat is already quite singular (and this is when those statistical uncertainties are named "art_unc_x"). When I change the name of statistical uncertainties to "stat_art_unc_x", the covmat is not positive definite anymore; below is the error I get when running a data-theory validphys report. numpy.linalg.LinAlgError: 32-th leading minor of the array is not positive definite
A hint is that the first (y*, yb) bin has 31 points, so the failure happens immediately after the first bin. I am not sure how to fix this. |
|
Dear @andrpie I'm afraid I don't understand the problem. The shifted data-theory comparison plots look very sensible to me. |
|
Hi @enocera, thank you for the clarification. You didn't misunderstand my problem. If the treatment of the uncertainty does not depend on its name, then I am also happy with the data-theory plots and consider the implementation of this dataset done. Yet it still is the case that if I name the uncertainties 'stat_art_unc', then the Cholesky decomposition breaks down, as opposed to when they are just called 'art_unc' (of course, in a consistent way). I have no idea what kind of evil force can be causing this issue... |
|
Please have a look whether the problem persists if you change and the lines below |



Addresses #2408 #2390