You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+48-17Lines changed: 48 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,37 @@ Idealy, we would use the information from prior model evaluations to guide us in
28
28
4. New parameter-score pairs are found
29
29
5. Repeat steps 2-4 until some stopping criteria is met
30
30
31
+
Graphical Intuition
32
+
-------------------
33
+
34
+
As an example, let's say we are only tuning 1 hyperparameter in an xgboost model, min\_child weight in (0,1). We have initialized the process by randomly sampling the scoring function 6 times, and get the following results:
35
+
36
+
| min\_child\_weight| Score|
37
+
|-------------------:|----------:|
38
+
| 0.6280082| 0.7133457|
39
+
| 0.3276477| 0.8655448|
40
+
| 0.7486012| 0.6814730|
41
+
| 0.2425469| 1.0000000|
42
+
| 0.0724098| 0.1308284|
43
+
| 0.1579683| 0.5733343|
44
+
45
+
How do we go about determining the best min\_child\_weight to try next? As it turns out, Gaussian processes can give us a very good definition for our prior distribution. Fitting a Gaussian process to the data above (indexed by min\_child\_weight), we can see the expected value accross our parameter bounds, as well as the uncertainty at different points:
46
+
47
+

48
+
49
+
Before we can select our next candidate parameter to run the scoring function on, we need to determine how we define a "good" parameter inside this prior distribution. This is done by maximizing different functions within the Gaussian process. There are several functions to choose from:
50
+
51
+
- Upper Confidence Bound (ucb)
52
+
- Probability Of Improvement (poi)
53
+
- Expected Improvement (ei)
54
+
- Expected Improvement Per Second (eips)
55
+
56
+
Continuing the example, we select to find the min\_child\_weight which maximizes the expected improvement according to the Gaussian process. As you can see, there are several good candidates:
57
+
58
+

59
+
60
+
An advanced feature of ParBayesianOptimization, which you can read about in the vignette advancedFeatures, describes how to use the `minClusterUtility` parameter to search over the different local maximums shown above. If not specified, only the global maximum would be sampled.
0 commit comments