You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if (!initialize& nrow(leftOff) ==0) stop("initialize cannot be FALSE if leftOff is not provided. Set initialize to TRUE and provide either initGrid or initPoints. You can provide leftOff AND initialize if you want.\n")
192
192
if (initialize& nrow(initGrid) ==0&initPoints<=0) stop("initialize is TRUE but neither initGrid or initPoints were provided")
193
193
if (initPoints>0& nrow(initGrid)>0) stop("initGrid and initPoints are specified, choose one.")
194
-
if (initPoints<=0& nrow(initGrid)==0) stop("neither initGrid or initPoints are specified, choose one or provide leftOff")
194
+
if (initPoints<=0& nrow(initGrid)==0& nrow(leftOff) ==0) stop("neither initGrid or initPoints are specified, choose one or provide leftOff")
195
195
if (parallel& (Workers==1)) stop("parallel is set to TRUE but no back end is registered.\n")
196
196
if (!parallel&Workers>1&verbose>0) cat("parallel back end is registered, but parallel is set to false. Process will not be run in parallel.\n")
@@ -131,33 +131,25 @@ When a leftOff table is provided, depending on how the experiment is set up, one
131
131
Keep in mind, if you change your bounds, you will need to delete any rows from your leftOff table that fall outside the bounds.
132
132
133
133
********
134
-
### Adjusting the noiseAdd parameter
134
+
### Adjusting noiseAdd and minClusterUtility
135
135
136
-
Once we have extracted the next expected optimal parameter set from the Gaussian process, we have several decisions to make. We can run 1 new scoringFunction at the new parameter, or we can run the scoring function n times in parallel at n different parameter sets. If we run several scoringFunctions in parallel, we need to decide where the other n-1 parameter sets come from. For the sake of decreasing uncertainty around the estimated optimal parameter, this process pulls the other n-1 parameter sets from a shape(4,4) beta distribution centered at the estimated optimal parameter.
136
+
If we want to run n scoring functions in parallel, an important decision we need to make is how to choose the next n parameter candidate sets. One logical choice is the parameter set which maximizes our acquisition function. However, we still need to decide on the other n-1 sets. There are two good choices:
137
137
138
-
As an example, let's say our min_child_weight is bounded between [0,10] and the Gaussian process says that our acquisition function is maximized at min_child_weight = 6. We can control how the process randomly samples around this point by using the noiseAdd parameter, which tells the process the percentage of the range specified by ```bounds``` to sample:
138
+
1. Add noise to the global optimum to sample nearby points.
139
+
2. Determine if there are other local optimums which may be nearly as good
140
+
141
+
This package allows you to do both. Using the ```minClusterUtility``` parameter, you can specify the minimum percentage utility of the global optimum required for a different local optimum to be considered. As an example, let's say we are optimizing 1 hyperparameter ```min_child_weight```, which is bounded between [0,10]. Our acquisition function may look like the following:
139
142
140
143
```{r eval = TRUE, echo=FALSE}
141
-
library(ggplot2)
142
-
143
-
y1 <- function(x) {
144
-
(x-7)^3*(5-(x))^3/15
145
-
}
144
+
knitr::include_graphics("Optimums.png")
145
+
```
146
146
147
-
y2 <- function(x) {
148
-
(x-8)^3*(4-(x))^3/1700
149
-
}
147
+
In this case, we may want to run our scoring function on both the global and local maximum. If ```minClusterUtility``` is set to be at least 1.83/2.14 ~ 0.855, the process would use both the local and global maximums as candidate parameter sets in the next round.
ggtitle("Distributions Sampled for Different noiseAdd Values") +
159
-
theme(plot.margin=unit(c(0.5,1,0.5,0.5),"cm"))
149
+
However, this doesn't fully solve our problem. In the example above, we had 2 local maximums, but what if we want to run 10 instances of our scoring function in parallel? We would need to come up with 8 other sets of parameters. For the sake of decreasing uncertainty around the most promising parameter sets, this process samples from a shape(4,4) beta distribution centered at the estimated optimal parameters. In the example above, our acquisition function was maximized at ```min_child_weight = 4```. The figure below shows the effect that adjusting the ```noiseAdd``` parameter has on how we draw the other 8 candidate parameter sets:
Now we need to define the scoring function. This function should, at a minimum, return a list with a ```Score``` element, which is the model evaluation metric we want to maximize. We can also retain other pieces of information created by the scoring function by including them as named elements of the returned list. In this case, we want to retain the optimal number of rounds determined by the ```xgb.cv```:
0 commit comments