datasets excluded from NNPDF4.0: CMS_2JET_8TEV_3D by andrpie · Pull Request #2418 · NNPDF/nnpdf

andrpie · 2026-01-13T01:03:32Z

achiefa

Below a few comments for this PR.

achiefa · 2026-02-02T12:58:45Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+- observable_name: 3D
+  process_type: DIJET
+  tables: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
+  ndata: [31, 26, 14, 23, 17, 11] # listed per (y*, yb) bin, sum 122


Validphys cannot parse a list for the ndata entry. Please, change it to the total number of points

Suggested change

ndata: [31, 26, 14, 23, 17, 11] # listed per (y*, yb) bin, sum 122

ndata: 122

achiefa · 2026-02-02T13:04:52Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+
+implemented_observables:
+- observable_name: 3D
+  process_type: DIJET


Apparently @enocera has already implemented the process for the 3D dijet distribution in the branch CMS_2JET_13TEV, that has not been merged yet. Try to use the same.

nnpdf/nnpdf_data/nnpdf_data/process_options.py

Line 530 in af215dd

"DIJET_3D": DIJET_3D,

achiefa · 2026-02-02T13:06:55Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+  kinematics:
+    file: kinematics.yaml
+    variables: # need to fix inline math here because load_yaml breaks down
+      pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"}


Use available kinematic variables. If the kinematics that you need is missing, you need to implement it. See here the available kinematics. Again, maybe @enocera has implemented new variables in his branch.

nnpdf/nnpdf_data/nnpdf_data/process_options.py

Lines 14 to 49 in af215dd

class _Vars:

x = "x"

Q2 = "Q2"

Q = "Q"

y = "y"

abs_y = "abs_y"

pT = "pT"

ET = "ET"

sqrts = "sqrts"

ystar = "ystar"

ydiff = "ydiff"

ymax = "ymax"

yb = "yb"

m_jj = "m_jj"

pT2 = "pT2"

y_t = "y_t"

y_ttBar = "y_ttBar"

m_t2 = "m_t2"

m_t = "m_t"

pT_t = "pT_t"

m_ttBar = "m_ttBar"

eta = "eta"

abs_eta = "abs_eta"

m_W2 = "m_W2"

m_Z2 = "m_Z2"

m_V2 = "m_V2"

m_W = "m_W"

m_Z = "m_Z"

M2 = "M2"

abs_eta_1 = "abs_eta_1"

abs_eta_2 = "abs_eta_2"

eta_1 = "eta_1"

eta_2 = "eta_2"

m_ll = "m_ll"

m_ll2 = "m_ll2"

abs_y = "abs_y"

achiefa · 2026-02-02T13:07:16Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+    variables: # need to fix inline math here because load_yaml breaks down
+      pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"}
+      ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""}
+      yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""}


Same here: yboost does not exist in process_options.py.

achiefa · 2026-02-02T13:08:58Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+      pTavg: {description: "average transverse momentum of the two leading jets", label: '\(p_{\text{T,avg}}\)', units: "GeV"}
+      ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""}
+      yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""}
+      sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"}


Remove sqrts from the kinematics. It's a reduntant information that is already incorporated in the name of the dataset. No need to keep it.

Suggested change

sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"}

achiefa · 2026-02-02T13:09:29Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+      ystar: {description: "half rapidity separation of the two leading jets", label: "$y^*$", units: ""}
+      yboost: {description: "the boost of the two leading jets", label: "$y_\text{b}$", units: ""}
+      sqrts: {description: "centre-of-mass energy", label: '\sqrt{s}', units: "TeV"}
+  kinematic_coverage: [pTavg, ystar, yboost, sqrts]


kinematic_coverage can only handle three variables. Remove sqrt

Suggested change

kinematic_coverage: [pTavg, ystar, yboost, sqrts]

kinematic_coverage: [pTavg, ystar, yboost]

achiefa · 2026-02-02T13:11:51Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/kinematics.yaml

+bins:
+- pTavg:
+    min: 133.0
+    mid: 143.0
+    max: 153.0
+  yboost:
+    min: 0.0
+    mid: 0.5
+    max: 1.0
+  ystar:
+    min: 0.0
+    mid: 0.5
+    max: 1.0
+  sqrts:
+    min: null
+    mid: 8
+    max: null


There is not prescription for the ordering of the kinematic variables, as long as the ordering matches with kinematic_coverage in metadata.yaml. However, pT is usually in the last position, see here

nnpdf/nnpdf_data/nnpdf_data/commondata/CMS_1JET_8TEV/kinematics.yaml

Lines 1 to 13 in af215dd

bins:

- y:

min: 0.0

mid: 0.25

max: 0.5

pT:

min: 21.0

mid: 22.5

max: 24.0

sqrts:

min: null

mid: 8000.0

max: null

If you change this, make sure you also change the ordering in kinematic_coverage.

Suggested change

bins:

- pTavg:

min: 133.0

mid: 143.0

max: 153.0

yboost:

min: 0.0

mid: 0.5

max: 1.0

ystar:

min: 0.0

mid: 0.5

max: 1.0

sqrts:

min: null

mid: 8

max: null

bins:

yboost:

min: 0.0

mid: 0.5

max: 1.0

ystar:

min: 0.0

mid: 0.5

max: 1.0

pTavg:

min: 133.0

mid: 143.0

max: 153.0

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

achiefa · 2026-02-13T08:26:02Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+      - CMS_2JET_8TEV_3D_3
+      - CMS_2JET_8TEV_3D_4
+      - CMS_2JET_8TEV_3D_5
+    operation: 'null' # from TeV to GeV


Please remove the comment because it's not pertinent to the operation key

achiefa · 2026-02-25T12:34:52Z

Hi @andrpie , what's the status of this? Cam you post data-theory comparisons here so that wa keep track for future records? Also, mind that the branch has to be rebased. Note that I'm about to merge onto master the PR Emanuele has worked on. This implements 3D kinematics for dijets. So it's likely that there will be conflicts.

andrpie · 2026-02-25T15:52:51Z

Hi @achiefa,

what's the status of this? Cam you post data-theory comparisons here so that wa keep track for future records?

I believe my treatment of the uncertainties (especially) systematics needs to be reassessed. Also, the covmat is highly singular here. I will post data-theory in the next comment.

Also, mind that the branch has to be rebased. Note that I'm about to merge onto master the PR Emanuele has worked on. This implements 3D kinematics for dijets. So it's likely that there will be conflicts.

Indeed, I will resolve the conflicts.

andrpie · 2026-02-25T16:34:32Z

Here is the data-theory comparison as it stands now. Quite similar to what we've seen in Amsterdam. The (experimental) chi2 is 3.52.

achiefa · 2026-02-25T16:38:43Z

Thanks for this. However, you need to rebase on top of master to get the correct data-theory plots. The reason is that last week Emanuele and I realised that there was a bug in the implementation of the systematic shift, which made these plots wrong. Nonetheless, the chi2 was not affected by this bug. Given the value that you obtained, I'd say that the predictions are fairly good. Of course we need to see the correct comparisons that depict a clear picture.

achiefa · 2026-02-25T16:44:53Z

Also, I see that the uncertainties in the data are basically zero. I suspect that for this dataset experimentalists only provided the full covariance matrix with all the uncertainties correlated. If this is the case, the whole uncertainty structure of the data flows into the shift, leaving the data without uncertainty. As a sanity check, could you please plot the same comparison wihtout the shift? You can just set the flag with_shift=false in the plot_fancy function.

achiefa · 2026-02-25T16:46:54Z

As a reference, a similar problem has occurred here: https://github.com/NNPDF/theories_slim/pull/67

scarlehoff · 2026-03-10T13:22:46Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+  version: 1
+
+nnpdf_metadata:
+  nnpdf31_process: "DIJET_3D"


theory uncertainties

Suggested change

nnpdf31_process: "DIJET_3D"

nnpdf31_process: "DIJET"

scarlehoff · 2026-03-10T13:23:24Z

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

+
+implemented_observables:
+- observable_name: 3D
+  process_type: DIJET


Suggested change

process_type: DIJET

process_type: DIJET_3D

scarlehoff · 2026-03-10T13:26:12Z

Please @andrpie update the labels and upload the right plots (with the fixes to the shift)

scarlehoff · 2026-03-12T11:57:37Z

@andrpie to get the plots with the right shifts you need to rebase on top of master (or merge from master, although the rebase is much preferred)

The rebase should be easy as the conflict should just the line in process_option that has been updated also from a different pull request.

…MS_2JET_8TEV_3D.

Crazy I missed it.... Co-authored-by: Amedeo Chiefa <103528316+achiefa@users.noreply.github.com>

…MS_2JET_8TEV_3D.

without these edits validphys would produce incorrect covmat

…MS_2JET_8TEV_3D.

andrpie · 2026-03-12T16:15:47Z

Dear @scarlehoff , @achiefa. Below is the data-theory comparison after the rebase.

With shifts:

Without shifts:

What worries me the most is the treatment of statistical uncertainties. If I understand correctly, validphys recognises uncertainty as statistical if its name starts with "stat". For each datapoint, there are 122 artificial statistical uncertainties that come from correlation matrices. As @achiefa pointed out, the covmat is already quite singular (and this is when those statistical uncertainties are named "art_unc_x"). When I change the name of statistical uncertainties to "stat_art_unc_x", the covmat is not positive definite anymore; below is the error I get when running a data-theory validphys report.

numpy.linalg.LinAlgError: 32-th leading minor of the array is not positive definite

Traceback (most recent call last): File "/opt/miniconda3/envs/nnpdf_dev/bin/validphys", line 6, in <module> sys.exit(main()) ^^^^^^ File "/Users/s2850353/Documents/nnpdf/validphys2/src/validphys/scripts/main.py", line 10, in main vp.main() File "/Users/s2850353/Documents/nnpdf/validphys2/src/validphys/app.py", line 155, in main a.main() File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/reportengine/app.py", line 421, in main self.run() File "/Users/s2850353/Documents/nnpdf/validphys2/src/validphys/app.py", line 150, in run super().run() File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/reportengine/app.py", line 406, in run rb.execute_sequential() File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/reportengine/resourcebuilder.py", line 210, in execute_sequential result = self.get_result(callspec.function, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/reportengine/resourcebuilder.py", line 380, in get_result fres = function(**kwdict) ^^^^^^^^^^^^^^^^^^ File "/Users/s2850353/Documents/nnpdf/validphys2/src/validphys/covmats.py", line 649, in sqrt_covmat decomp = la.cholesky(correlation_matrix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/scipy/_lib/_util.py", line 1181, in wrapper return f(*arrays, *other_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/scipy/linalg/_decomp_cholesky.py", line 106, in cholesky c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/nnpdf_dev/lib/python3.12/site-packages/scipy/linalg/_decomp_cholesky.py", line 39, in _cholesky raise LinAlgError( numpy.linalg.LinAlgError: 32-th leading minor of the array is not positive definite

A hint is that the first (y*, yb) bin has 31 points, so the failure happens immediately after the first bin. I am not sure how to fix this.

achiefa · 2026-03-12T16:30:48Z

@enocera

enocera · 2026-03-12T16:50:35Z

Dear @andrpie I'm afraid I don't understand the problem. The shifted data-theory comparison plots look very sensible to me.
As far as I know, validphys doesn't care whether an uncertainty is statistical or systematic. it only cares about th efact it is CORR/UNCORR or ADD/MULT. Insofar as the name is kept consistent nothing should change. In other words, if you rename each "artificial uncertainty" from art_unc_x to stat_art_unc nothing should change. of course oyu have to change all the names consistently. perhaps I misunderstood your problem?

andrpie · 2026-03-12T17:50:58Z

Hi @enocera, thank you for the clarification. You didn't misunderstand my problem. If the treatment of the uncertainty does not depend on its name, then I am also happy with the data-theory plots and consider the implementation of this dataset done.

Yet it still is the case that if I name the uncertainties 'stat_art_unc', then the Cholesky decomposition breaks down, as opposed to when they are just called 'art_unc' (of course, in a consistent way). I have no idea what kind of evil force can be causing this issue...

scarlehoff · 2026-03-17T13:29:14Z

Please have a look whether the problem persists if you change

nnpdf/nnpdf_data/nnpdf_data/commondataparser.py

Line 940 in 3d31eaf

    
           [i for i in uncertainties_df.columns.get_level_values(0) if not i.startswith("stat")]

and the lines below

andrpie requested review from achiefa and enocera January 13, 2026 01:03

andrpie self-assigned this Jan 13, 2026

andrpie added the data toolchain label Jan 13, 2026

andrpie force-pushed the implement_CMS_2JET_8TEV_3D branch from b7f065e to 9805191 Compare January 13, 2026 01:03

andrpie marked this pull request as ready for review January 13, 2026 01:05

andrpie added regenerate-data and removed data toolchain labels Jan 13, 2026

andrpie mentioned this pull request Jan 20, 2026

Implementation of data sets excluded from NNPDF4.0 #2390

Open

scarlehoff mentioned this pull request Jan 20, 2026

Add download from plougshare NNPDF/pinefarm#102

Open

scarlehoff added the data toolchain label Jan 20, 2026

scarlehoff assigned enocera Jan 20, 2026

achiefa reviewed Feb 2, 2026

View reviewed changes

andrpie force-pushed the implement_CMS_2JET_8TEV_3D branch from 78069ab to e330dd7 Compare February 3, 2026 12:08

achiefa reviewed Feb 3, 2026

View reviewed changes

nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml Outdated Show resolved Hide resolved

andrpie force-pushed the implement_CMS_2JET_8TEV_3D branch from af142f5 to c799acb Compare February 3, 2026 18:34

achiefa reviewed Feb 13, 2026

View reviewed changes

scarlehoff reviewed Mar 10, 2026

View reviewed changes

scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Mar 10, 2026

andrpie and others added 2 commits March 12, 2026 14:57

initial implementation of the dataset

fffee51

Automatically regenerated commondata from PR 2418, branch implement_C…

65a6cae

…MS_2JET_8TEV_3D.

andrpie and others added 9 commits March 12, 2026 14:58

corrections suggested by @achiefa

6d9e32a

Update nnpdf_data/nnpdf_data/commondata/CMS_2JET_8TEV/metadata.yaml

6ba2723

Crazy I missed it.... Co-authored-by: Amedeo Chiefa <103528316+achiefa@users.noreply.github.com>

initial implementation of the dataset

5760ad7

Automatically regenerated commondata from PR 2418, branch implement_C…

002fbfa

…MS_2JET_8TEV_3D.

corrections suggested by @achiefa

f33eb86

further changes to the implementation

6ef2312

without these edits validphys would produce incorrect covmat

specify new grid names

8fadb01

Automatically regenerated commondata from PR 2418, branch implement_C…

203e63d

…MS_2JET_8TEV_3D.

corrected process_type and nnpdf31_process

308718a

andrpie force-pushed the implement_CMS_2JET_8TEV_3D branch from fe0d052 to 308718a Compare March 12, 2026 15:16

	ndata: [31, 26, 14, 23, 17, 11] # listed per (y*, yb) bin, sum 122
	ndata: 122

	class _Vars:
	x = "x"
	Q2 = "Q2"
	Q = "Q"
	y = "y"
	abs_y = "abs_y"
	pT = "pT"
	ET = "ET"
	sqrts = "sqrts"
	ystar = "ystar"
	ydiff = "ydiff"
	ymax = "ymax"
	yb = "yb"
	m_jj = "m_jj"
	pT2 = "pT2"
	y_t = "y_t"
	y_ttBar = "y_ttBar"
	m_t2 = "m_t2"
	m_t = "m_t"
	pT_t = "pT_t"
	m_ttBar = "m_ttBar"
	eta = "eta"
	abs_eta = "abs_eta"
	m_W2 = "m_W2"
	m_Z2 = "m_Z2"
	m_V2 = "m_V2"
	m_W = "m_W"
	m_Z = "m_Z"
	M2 = "M2"
	abs_eta_1 = "abs_eta_1"
	abs_eta_2 = "abs_eta_2"
	eta_1 = "eta_1"
	eta_2 = "eta_2"
	m_ll = "m_ll"
	m_ll2 = "m_ll2"
	abs_y = "abs_y"

	kinematic_coverage: [pTavg, ystar, yboost, sqrts]
	kinematic_coverage: [pTavg, ystar, yboost]

	bins:
	- y:
	min: 0.0
	mid: 0.25
	max: 0.5
	pT:
	min: 21.0
	mid: 22.5
	max: 24.0
	sqrts:
	min: null
	mid: 8000.0
	max: null

Conversation

andrpie commented Jan 13, 2026

Uh oh!

achiefa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achiefa commented Feb 25, 2026

Uh oh!

andrpie commented Feb 25, 2026

Uh oh!

andrpie commented Feb 25, 2026

Uh oh!

achiefa commented Feb 25, 2026

Uh oh!

achiefa commented Feb 25, 2026

Uh oh!

achiefa commented Feb 25, 2026

Uh oh!

scarlehoff Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scarlehoff commented Mar 10, 2026

Uh oh!

scarlehoff commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrpie commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achiefa commented Mar 12, 2026

Uh oh!

enocera commented Mar 12, 2026

Uh oh!

andrpie commented Mar 12, 2026

Uh oh!

scarlehoff commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scarlehoff Mar 10, 2026 •

edited

Loading

scarlehoff commented Mar 12, 2026 •

edited

Loading

andrpie commented Mar 12, 2026 •

edited

Loading