Skip to content

Conversation

@ShubhamSRath
Copy link
Contributor

Changes:

  1. Integrated Uncertainty sampling into SparseGrid dataclass for training data
  2. Added two methods for generating training points using GP Uncertainty : (1) Local Variance, (2) Global Variance

Copy link
Owner

@eckelsjd eckelsjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start -- the changes are localized to training.py which is good.

Here are a few high-level remarks/issues:

  • It feels like uncertainty sampling should be independent of the specific Interpolator method. For example, what if I want to use a neural network or linear regression for prediction, but I want the experimental design to be based on GPR-like uncertainty metrics? This first attempt at passing the GPR state to TrainingData is intuitive, but it feels like a special case rather than a general rule. Since GPR training really isn't that expensive, maybe it makes sense when doing TrainingData.refine to just train a quick GPR purely for the sake of experimental design via uncertainty sampling -- it already has access to all the training data.
  • Uncertainty sampling via GPR usually samples points over the full N-dimensional domain, but here it is being used in collocation_1d. I do not quite understand how this is working. I understand wanting to reuse SparseGrid and just replace the leja option, but the SparseGrid relies on the ability to refine a single dimension at a time -- it's not clear how GPR is being used to accomplish this. In addition, if we really are only refining 1 dimension, then the first bullet point about running GPR on-the-fly will be even easier.
  • There needs to be test cases for the new functionality. At a bare minimum, we need a test case for the new sampling method by itself, then within an Interpolator, then Component, and possibly one integration test within a full System.

A few smaller comments:

  • We should really try to avoid changing function signatures as much as possible. For example, I don't see a need to change collocation_1d. And given the first bullet point above, it might not be necessary to change TrainingData.refine() either.
  • The params that control the uncertainty sampling method should be grouped into something like collocation_opts, very similar to how it was done for the GPR interpolator object. It seems some of these opts may be custom rather than passed directly to sklearn, in which case we might need a TypedDict for better documentation of what the opts do. I wouldn't worry about this until all ther other points are addressed first though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants