@@ -14,14 +14,14 @@ generated by referencing each member's and the swarm's best models so far.
14
14
15
15
A single one-dimensional range or vector of one-dimensional ranges can be
16
16
specified. `ParamRange` objects are constructed using the `range` method. If not
17
- paired with a prior, then one is fitted, as follows:
17
+ paired with a prior, then one is fitted and truncated if bounded , as follows:
18
18
19
- | Range Types | Default Distribution |
20
- |:----------------------- |:-------------------- |
21
- | `NominalRange` | `Dirichlet` |
22
- | Bounded `NumericRange` | `Uniform` |
23
- | Positive `NumericRange` | `Gamma` |
24
- | Other `NumericRange` | `Normal` |
19
+ | Range Types | Default Distribution |
20
+ |:----------------------- |:------------------------------------------- |
21
+ | `NominalRange` | `Dirichlet([1, 1, ..., 1])` |
22
+ | Bounded `NumericRange` | `Uniform(lower, upper)` |
23
+ | Positive `NumericRange` | `Gamma(α=(origin/unit)^2, θ=unit^2/origin`) |
24
+ | Other `NumericRange` | `Normal(origin, unit)` |
25
25
26
26
Specifically, in `ParticleSwarm`, the `range` field of a `TunedModel` instance
27
27
can be:
@@ -68,21 +68,42 @@ each swarm particle. Velocity is initiated to be zeros, and in each iteration,
68
68
every particle's position is updated to approach its personal best and the
69
69
swarm's best models so far with the equations:
70
70
71
- \$ vₖ₊₁ = w⋅vₖ + c₁⋅rand()⋅(pbest - x ) + c₂⋅rand()⋅(gbest - x )\$
71
+ \$ vₖ₊₁ = w⋅vₖ + c₁⋅rand()⋅(pbest - xₖ ) + c₂⋅rand()⋅(gbest - xₖ )\$
72
72
73
73
\$ xₖ₊₁ = xₖ + vₖ₊₁\$
74
74
75
75
New models are then generated for evaluation by mutating the fields of a deep
76
76
copy of `model`. If the corresponding range has a specified `scale` function,
77
- then the transformation is applied before the hyperparameter is returned. For
78
- integer `NumericRange`s, the hyperparameter is rounded; and for `NominalRange`s,
79
- the hyperparameter is sampled from the specified values with the probability
80
- weights given by each particle.
81
-
82
- Personal and social best models are then updated for the swarm. In order to
83
- replicate both the probability weights and the sampled value for `NominalRange`s
84
- of the best models, the weights of unselected values are shifted to the selected
85
- one by the `prob_shift` factor.
77
+ then the transformation is applied before the hyperparameter is returned. If
78
+ `scale` is a symbol (eg, `:log`), it is ignored.
79
+
80
+ ### Discrete Hyperparameter Handling
81
+
82
+ Since particle swarm is an optimization method for continuous problems, integer
83
+ and nominal hyperparameters require special handling: they are converted to
84
+ continuous values, and transformed back to their original domains at each step
85
+ for evaluation.
86
+
87
+ For integer `NumericRange`s, a continuous distribution is fitted to generate
88
+ initial values for the swarm. They are then rounded when each particle is mapped
89
+ to the corresponding candidate model.
90
+
91
+ `NominalRange`s on the other hand are represented as categorical distributions
92
+ over their values. Hence, we use Dirichlet prior distributions to initialize a
93
+ probability vector for each particle, defaulting to the uniform distribution
94
+ Dirichlet([1, 1, ..., 1]). The same velocity and position updates apply, but
95
+ probability values are further clamped in the range [0, 1] and normalized to sum
96
+ up to 1. When a better model is found, we replicate both its probability vector
97
+ and sampled value by shifting unchosen categories' weights towards the selected
98
+ one for pbest and gbest models:
99
+
100
+ \$ pᵢ = (1 - prob_shift) * pᵢ\$
101
+
102
+ \$ pₛ = pₛ + prob_shift\$
103
+
104
+ where pₛ is the probability of the sampled hyperparameter value. For more
105
+ information, refer to "A New Discrete Particle Swarm Optimization Algorithm" by
106
+ Strasser, Goodman, Sheppard, and Butcher.
86
107
"""
87
108
mutable struct ParticleSwarm{R<: AbstractRNG } <: AbstractParticleSwarm
88
109
n_particles:: Int
0 commit comments