-
Notifications
You must be signed in to change notification settings - Fork 13
Make replicas Colibri compatible #2431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
c5dfade
Colibri compatible
ecole41 3e493b2
Separated group seed
ecole41 4fe29ec
Make_replica take central value array and mult_unc dict
ecole41 4f6a629
changed func name
ecole41 24e9881
Added mask func and docstrings
ecole41 63dfaf4
Added negative data index to warning
ecole41 7d4f2b9
removed resample_negative_pseudodata from make_replica, put into grou…
ecole41 0a94a68
Allowing for genrep=False
ecole41 18b395e
set genrep default to True in seed func
ecole41 0cc181e
Update validphys2/src/validphys/pseudodata.py
ecole41 5791216
Update validphys2/src/validphys/pseudodata.py
ecole41 8d2468c
don't try to use tensorboard if it is not requested
scarlehoff e317988
update reference fit
scarlehoff c7469b3
level1 edit
ecole41 ce74fbc
add a quick test to level1 data
scarlehoff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -121,16 +121,17 @@ def read_replica_pseudodata(fit, context_index, replica): | |||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def make_replica( | ||||||||||||||||||
| groups_dataset_inputs_loaded_cd_with_cuts, | ||||||||||||||||||
| replica_mcseed, | ||||||||||||||||||
| central_values_array, | ||||||||||||||||||
| group_replica_mcseed, | ||||||||||||||||||
| dataset_inputs_sampling_covmat, | ||||||||||||||||||
| group_multiplicative_errors=None, | ||||||||||||||||||
| group_positivity_mask=None, | ||||||||||||||||||
| sep_mult=False, | ||||||||||||||||||
| genrep=True, | ||||||||||||||||||
| max_tries=int(1e6), | ||||||||||||||||||
| resample_negative_pseudodata=False, | ||||||||||||||||||
| ): | ||||||||||||||||||
| """Function that takes in a list of :py:class:`nnpdf_data.coredata.CommonData` | ||||||||||||||||||
| objects and returns a pseudodata replica accounting for | ||||||||||||||||||
| """Function that takes in a central value array and a covariance matrix | ||||||||||||||||||
| and returns a pseudodata replica accounting for | ||||||||||||||||||
| possible correlations between systematic uncertainties. | ||||||||||||||||||
|
|
||||||||||||||||||
| The function loops until positive definite pseudodata is generated for any | ||||||||||||||||||
|
|
@@ -139,17 +140,22 @@ def make_replica( | |||||||||||||||||
|
|
||||||||||||||||||
| Parameters | ||||||||||||||||||
| --------- | ||||||||||||||||||
| groups_dataset_inputs_loaded_cd_with_cuts: list[:py:class:`nnpdf_data.coredata.CommonData`] | ||||||||||||||||||
| List of CommonData objects which stores information about systematic errors, | ||||||||||||||||||
| their treatment and description, for each dataset. | ||||||||||||||||||
| central_values_array: np.array | ||||||||||||||||||
| Numpy array which is N_dat (where N_dat is the combined number of data points after cuts) | ||||||||||||||||||
| containing the central values of the data. | ||||||||||||||||||
|
|
||||||||||||||||||
| replica_mcseed: int, None | ||||||||||||||||||
| Seed used to initialise the numpy random number generator. If ``None`` then a random seed is | ||||||||||||||||||
| allocated using the default numpy behaviour. | ||||||||||||||||||
| group_replica_mcseed: int | ||||||||||||||||||
| Seed used to initialise the numpy random number generator. | ||||||||||||||||||
|
|
||||||||||||||||||
| dataset_inputs_sampling_covmat: np.array | ||||||||||||||||||
| Full covmat to be used. It can be either only experimental or also theoretical. | ||||||||||||||||||
|
|
||||||||||||||||||
| group_multiplicative_errors: dict | ||||||||||||||||||
| Dictionary containing the multiplicative uncertainties contribution to the pseudodata replica. | ||||||||||||||||||
|
|
||||||||||||||||||
| group_positivity_mask: np.array | ||||||||||||||||||
| Boolean array of shape (N_dat,) indicating which data points should be positive. | ||||||||||||||||||
|
|
||||||||||||||||||
| sep_mult: bool | ||||||||||||||||||
| Specifies whether computing the shifts with the full covmat | ||||||||||||||||||
| or whether multiplicative errors should be separated | ||||||||||||||||||
|
|
@@ -162,9 +168,6 @@ def make_replica( | |||||||||||||||||
| If after max_tries (default=1e6) no physical configuration is found, | ||||||||||||||||||
| it will raise a :py:class:`ReplicaGenerationError` | ||||||||||||||||||
|
|
||||||||||||||||||
| resample_negative_pseudodata: bool | ||||||||||||||||||
| When True, replicas that produce negative predictions will be resampled for ``max_tries`` | ||||||||||||||||||
| until all points are positive (default: False) | ||||||||||||||||||
| Returns | ||||||||||||||||||
| ------- | ||||||||||||||||||
| pseudodata: np.array | ||||||||||||||||||
|
|
@@ -187,90 +190,122 @@ def make_replica( | |||||||||||||||||
| 0.34206012, 0.31866286, 0.2790856 , 0.33257621, 0.33680007, | ||||||||||||||||||
| """ | ||||||||||||||||||
| if not genrep: | ||||||||||||||||||
| return np.concatenate( | ||||||||||||||||||
| [cd.central_values for cd in groups_dataset_inputs_loaded_cd_with_cuts] | ||||||||||||||||||
| ) | ||||||||||||||||||
| # Seed the numpy RNG with the seed and the name of the datasets in this run | ||||||||||||||||||
|
|
||||||||||||||||||
| # TODO: to be simplified after the reader is merged, together with an update of the regression tests | ||||||||||||||||||
| # this is necessary to reproduce exactly the results due to the replicas being generated with a hash | ||||||||||||||||||
| # Only when the sets are legacy (or coming from a legacy runcard) this shall be used | ||||||||||||||||||
| names_for_salt = [] | ||||||||||||||||||
| for loaded_cd in groups_dataset_inputs_loaded_cd_with_cuts: | ||||||||||||||||||
| if loaded_cd.legacy_names is None: | ||||||||||||||||||
| names_for_salt.append(loaded_cd.setname) | ||||||||||||||||||
| else: | ||||||||||||||||||
| names_for_salt.append(loaded_cd.legacy_names[0]) | ||||||||||||||||||
| name_salt = "-".join(names_for_salt) | ||||||||||||||||||
| return central_values_array | ||||||||||||||||||
|
|
||||||||||||||||||
| name_seed = int(hashlib.sha256(name_salt.encode()).hexdigest(), 16) % 10**8 | ||||||||||||||||||
| rng = np.random.default_rng(seed=replica_mcseed + name_seed) | ||||||||||||||||||
| # Set random seed | ||||||||||||||||||
| rng = np.random.default_rng(seed=group_replica_mcseed) | ||||||||||||||||||
| # construct covmat | ||||||||||||||||||
| covmat = dataset_inputs_sampling_covmat | ||||||||||||||||||
| covmat_sqrt = sqrt_covmat(covmat) | ||||||||||||||||||
| # Loading the data | ||||||||||||||||||
| pseudodatas = [] | ||||||||||||||||||
| check_positive_masks = [] | ||||||||||||||||||
| nonspecial_mult = [] | ||||||||||||||||||
| special_mult = [] | ||||||||||||||||||
| for cd in groups_dataset_inputs_loaded_cd_with_cuts: | ||||||||||||||||||
| # copy here to avoid mutating the central values. | ||||||||||||||||||
| pseudodata = cd.central_values.to_numpy() | ||||||||||||||||||
|
|
||||||||||||||||||
| pseudodatas.append(pseudodata) | ||||||||||||||||||
| # Separation of multiplicative errors. If sep_mult is True also the exp_covmat is produced | ||||||||||||||||||
| # without multiplicative errors | ||||||||||||||||||
| if sep_mult: | ||||||||||||||||||
| mult_errors = cd.multiplicative_errors | ||||||||||||||||||
| mult_uncorr_errors = mult_errors.loc[:, mult_errors.columns == "UNCORR"].to_numpy() | ||||||||||||||||||
| mult_corr_errors = mult_errors.loc[:, mult_errors.columns == "CORR"].to_numpy() | ||||||||||||||||||
| nonspecial_mult.append((mult_uncorr_errors, mult_corr_errors)) | ||||||||||||||||||
| special_mult.append( | ||||||||||||||||||
| mult_errors.loc[:, ~mult_errors.columns.isin(INTRA_DATASET_SYS_NAME)] | ||||||||||||||||||
| ) | ||||||||||||||||||
| if "ASY" in cd.commondataproc or cd.commondataproc.endswith("_POL"): | ||||||||||||||||||
| check_positive_masks.append(np.zeros_like(pseudodata, dtype=bool)) | ||||||||||||||||||
| else: | ||||||||||||||||||
| check_positive_masks.append(np.ones_like(pseudodata, dtype=bool)) | ||||||||||||||||||
| # concatenating special multiplicative errors, pseudodatas and positive mask | ||||||||||||||||||
| if sep_mult: | ||||||||||||||||||
| special_mult_errors = pd.concat(special_mult, axis=0, sort=True).fillna(0).to_numpy() | ||||||||||||||||||
| all_pseudodata = np.concatenate(pseudodatas, axis=0) | ||||||||||||||||||
| full_mask = np.concatenate(check_positive_masks, axis=0) | ||||||||||||||||||
| full_mask = ( | ||||||||||||||||||
| group_positivity_mask | ||||||||||||||||||
| if group_positivity_mask is not None | ||||||||||||||||||
| else np.zeros_like(central_values_array, dtype=bool) | ||||||||||||||||||
| ) | ||||||||||||||||||
| # The inner while True loop is for ensuring a positive definite | ||||||||||||||||||
| # pseudodata replica | ||||||||||||||||||
| for _ in range(max_tries): | ||||||||||||||||||
| mult_shifts = [] | ||||||||||||||||||
| # Prepare the per-dataset multiplicative shifts | ||||||||||||||||||
| for mult_uncorr_errors, mult_corr_errors in nonspecial_mult: | ||||||||||||||||||
| # convert to from percent to fraction | ||||||||||||||||||
| mult_shift = ( | ||||||||||||||||||
| 1 + mult_uncorr_errors * rng.normal(size=mult_uncorr_errors.shape) / 100 | ||||||||||||||||||
| ).prod(axis=1) | ||||||||||||||||||
| if group_multiplicative_errors is not None: | ||||||||||||||||||
| for mult_uncorr_errors, mult_corr_errors in group_multiplicative_errors[ | ||||||||||||||||||
| "nonspecial_mult" | ||||||||||||||||||
| ]: | ||||||||||||||||||
| # convert to from percent to fraction | ||||||||||||||||||
| mult_shift = ( | ||||||||||||||||||
| 1 + mult_uncorr_errors * rng.normal(size=mult_uncorr_errors.shape) / 100 | ||||||||||||||||||
| ).prod(axis=1) | ||||||||||||||||||
|
|
||||||||||||||||||
| mult_shift *= ( | ||||||||||||||||||
| 1 + mult_corr_errors * rng.normal(size=(1, mult_corr_errors.shape[1])) / 100 | ||||||||||||||||||
| ).prod(axis=1) | ||||||||||||||||||
| mult_shift *= ( | ||||||||||||||||||
| 1 + mult_corr_errors * rng.normal(size=(1, mult_corr_errors.shape[1])) / 100 | ||||||||||||||||||
| ).prod(axis=1) | ||||||||||||||||||
|
|
||||||||||||||||||
| mult_shifts.append(mult_shift) | ||||||||||||||||||
| mult_shifts.append(mult_shift) | ||||||||||||||||||
|
|
||||||||||||||||||
| # If sep_mult is true then the multiplicative shifts were not included in the covmat | ||||||||||||||||||
| shifts = covmat_sqrt @ rng.normal(size=covmat.shape[1]) | ||||||||||||||||||
| mult_part = 1.0 | ||||||||||||||||||
| if sep_mult: | ||||||||||||||||||
| special_mult_errors = group_multiplicative_errors["special_mult"] | ||||||||||||||||||
| special_mult = ( | ||||||||||||||||||
| 1 + special_mult_errors * rng.normal(size=(1, special_mult_errors.shape[1])) / 100 | ||||||||||||||||||
| ).prod(axis=1) | ||||||||||||||||||
| mult_part = np.concatenate(mult_shifts, axis=0) * special_mult | ||||||||||||||||||
| # Shifting pseudodata | ||||||||||||||||||
| shifted_pseudodata = (all_pseudodata + shifts) * mult_part | ||||||||||||||||||
| shifted_pseudodata = (central_values_array + shifts) * mult_part | ||||||||||||||||||
| # positivity control | ||||||||||||||||||
| if np.all(shifted_pseudodata[full_mask] >= 0) or not resample_negative_pseudodata: | ||||||||||||||||||
| if np.all(shifted_pseudodata[full_mask] >= 0): | ||||||||||||||||||
| return shifted_pseudodata | ||||||||||||||||||
|
|
||||||||||||||||||
| dfail = " ".join(i.setname for i in groups_dataset_inputs_loaded_cd_with_cuts) | ||||||||||||||||||
| log.error(f"Error generating replicas for the group: {dfail}") | ||||||||||||||||||
| raise ReplicaGenerationError(f"No valid replica found after {max_tries} attempts") | ||||||||||||||||||
| # Find which dataset index corresponds to the negative points, and print it out for debugging purposes | ||||||||||||||||||
| negative_mask = shifted_pseudodata < 0 & full_mask | ||||||||||||||||||
| negative_indices = np.where(negative_mask)[0] | ||||||||||||||||||
|
|
||||||||||||||||||
| raise ReplicaGenerationError( | ||||||||||||||||||
| f"No valid replica found after {max_tries} attempts. " | ||||||||||||||||||
| f"Negative global indices: {negative_indices.tolist()}" | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def central_values_array(groups_dataset_inputs_loaded_cd_with_cuts): | ||||||||||||||||||
| """Function that takes in a list of :py:class:`nnpdf_data.coredata.CommonData` | ||||||||||||||||||
| and returns the central values concatenated in a single array. | ||||||||||||||||||
| """ | ||||||||||||||||||
| central_values = [] | ||||||||||||||||||
| for cd in groups_dataset_inputs_loaded_cd_with_cuts: | ||||||||||||||||||
| central_values.append(cd.central_values.to_numpy()) | ||||||||||||||||||
| return np.concatenate(central_values, axis=0) | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def group_multiplicative_errors(groups_dataset_inputs_loaded_cd_with_cuts, sep_mult=False): | ||||||||||||||||||
| """Function that takes in a list of :py:class:`nnpdf_data.coredata.CommonData` | ||||||||||||||||||
| and returns the multiplicative uncertainties contribution to the pseudodata replica. | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
@enocera do you remember why are we doing this separation?? |
||||||||||||||||||
| """ | ||||||||||||||||||
ecole41 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
| if not sep_mult: | ||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
| nonspecial_mult = [] | ||||||||||||||||||
| special_mult = [] | ||||||||||||||||||
| special_mult_errors = [] | ||||||||||||||||||
| for cd in groups_dataset_inputs_loaded_cd_with_cuts: | ||||||||||||||||||
| if sep_mult: | ||||||||||||||||||
| mult_errors = cd.multiplicative_errors | ||||||||||||||||||
| mult_uncorr_errors = mult_errors.loc[:, mult_errors.columns == "UNCORR"].to_numpy() | ||||||||||||||||||
| mult_corr_errors = mult_errors.loc[:, mult_errors.columns == "CORR"].to_numpy() | ||||||||||||||||||
| nonspecial_mult.append((mult_uncorr_errors, mult_corr_errors)) | ||||||||||||||||||
| special_mult.append( | ||||||||||||||||||
| mult_errors.loc[:, ~mult_errors.columns.isin(INTRA_DATASET_SYS_NAME)] | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| # concatenating special multiplicative errors | ||||||||||||||||||
| if sep_mult: | ||||||||||||||||||
| special_mult_errors = pd.concat(special_mult, axis=0, sort=True).fillna(0).to_numpy() | ||||||||||||||||||
|
|
||||||||||||||||||
| multiplicative_errors = { | ||||||||||||||||||
| "nonspecial_mult": nonspecial_mult, | ||||||||||||||||||
| "special_mult": special_mult_errors, | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| return multiplicative_errors | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def group_positivity_mask( | ||||||||||||||||||
| groups_dataset_inputs_loaded_cd_with_cuts, resample_negative_pseudodata=False | ||||||||||||||||||
| ): | ||||||||||||||||||
| """Function that takes in a list of :py:class:`nnpdf_data.coredata.CommonData` | ||||||||||||||||||
| and returns a boolean mask indicating which data points should be positive. | ||||||||||||||||||
| """ | ||||||||||||||||||
| if not resample_negative_pseudodata: | ||||||||||||||||||
| return None | ||||||||||||||||||
| check_positive_masks = [] | ||||||||||||||||||
| for cd in groups_dataset_inputs_loaded_cd_with_cuts: | ||||||||||||||||||
| if "ASY" in cd.commondataproc or cd.commondataproc.endswith("_POL"): | ||||||||||||||||||
| check_positive_masks.append(np.zeros_like(cd.central_values.to_numpy(), dtype=bool)) | ||||||||||||||||||
| else: | ||||||||||||||||||
| check_positive_masks.append(np.ones_like(cd.central_values.to_numpy(), dtype=bool)) | ||||||||||||||||||
| full_mask = np.concatenate(check_positive_masks, axis=0) | ||||||||||||||||||
| return full_mask | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def indexed_make_replica(groups_index, make_replica): | ||||||||||||||||||
|
|
@@ -397,8 +432,11 @@ def make_level1_data(data, level0_commondata_wc, filterseed, data_index, sep_mul | |||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| # ================== generation of Level1 data ======================# | ||||||||||||||||||
| central_vals= central_values_array(level0_commondata_wc) | ||||||||||||||||||
| group_mult_errs = group_multiplicative_errors(level0_commondata_wc, sep_mult=sep_mult) | ||||||||||||||||||
| group_pos_mask = group_positivity_mask(level0_commondata_wc) | ||||||||||||||||||
| level1_data = make_replica( | ||||||||||||||||||
| level0_commondata_wc, filterseed, covmat, sep_mult=sep_mult, genrep=True | ||||||||||||||||||
| central_vals, filterseed, covmat, group_multiplicative_errors=group_mult_errs, group_positivity_mask=group_pos_mask, sep_mult=sep_mult, genrep=True, | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| indexed_level1_data = indexed_make_replica(data_index, level1_data) | ||||||||||||||||||
|
|
||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.