Skip to content

Commit eb15eae

Browse files
Use consistent spelling for optimise (#2203)
* Use consistent spelling for optimise * Update NEWS.md * Update NEWS.md --------- Co-authored-by: Michael Abbott <[email protected]>
1 parent 0d83f60 commit eb15eae

File tree

12 files changed

+33
-32
lines changed

12 files changed

+33
-32
lines changed

NEWS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Flux Release Notes
22

3+
34
## v0.13.14
45
* Fixed various deprecation warnings, from `Zygone.@nograd` and `Vararg`.
56

@@ -45,7 +46,7 @@ been removed in favour of MLDatasets.jl.
4546
* Fixed [AlphaDropout](https://github.com/FluxML/Flux.jl/pull/1781)
4647

4748
## v0.12.8
48-
* Optimized inference and gradient calculation of OneHotMatrix[pr](https://github.com/FluxML/Flux.jl/pull/1756)
49+
* Optimised inference and gradient calculation of OneHotMatrix[pr](https://github.com/FluxML/Flux.jl/pull/1756)
4950

5051
## v0.12.7
5152
* Added support for [`GRUv3`](https://github.com/FluxML/Flux.jl/pull/1675)
@@ -99,7 +100,7 @@ been removed in favour of MLDatasets.jl.
99100
* Change to `DataLoader`'s [constructor](https://github.com/FluxML/Flux.jl/pull/1152)
100101
* Uniform loss [interface](https://github.com/FluxML/Flux.jl/pull/1150)
101102
* Loss functions now live in the `Flux.Losses` [module](https://github.com/FluxML/Flux.jl/pull/1264)
102-
* Optimistic ADAM (OADAM) optimizer for [adversarial training](https://github.com/FluxML/Flux.jl/pull/1246).
103+
* Optimistic ADAM (OADAM) optimiser for [adversarial training](https://github.com/FluxML/Flux.jl/pull/1246).
103104
* Add option for [same padding](https://github.com/FluxML/Flux.jl/pull/901) to conv and pooling layers by setting `pad=SamePad()`.
104105
* Added option to set `bias` to [Flux.Zeros](https://github.com/FluxML/Flux.jl/pull/873) to eliminating `bias` from being trained.
105106
* Added `GlobalMaxPool` and `GlobalMeanPool` [layers](https://github.com/FluxML/Flux.jl/pull/950) for performing global pooling operations.

docs/src/ecosystem.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Packages based on differentiable programming but not necessarily related to Mach
9999

100100
Some useful and random packages!
101101

102-
- [AdversarialPrediction.jl](https://github.com/rizalzaf/AdversarialPrediction.jl) provides a way to easily optimize generic performance metrics in supervised learning settings using the [Adversarial Prediction](https://arxiv.org/abs/1812.07526) framework.
102+
- [AdversarialPrediction.jl](https://github.com/rizalzaf/AdversarialPrediction.jl) provides a way to easily optimise generic performance metrics in supervised learning settings using the [Adversarial Prediction](https://arxiv.org/abs/1812.07526) framework.
103103
- [Mill.jl](https://github.com/CTUAvastLab/Mill.jl) helps to prototype flexible multi-instance learning models.
104104
- [MLMetrics.jl](https://github.com/JuliaML/MLMetrics.jl) is a utility for scoring models in data science and machine learning.
105105
- [Torch.jl](https://github.com/FluxML/Torch.jl) exposes torch in Julia.

docs/src/gpu.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,12 +138,12 @@ In order to train the model using the GPU both model and the training data have
138138
1. Iterating over the batches in a [DataLoader](@ref) object transferring each one of the training batches at a time to the GPU.
139139
```julia
140140
train_loader = Flux.DataLoader((xtrain, ytrain), batchsize = 64, shuffle = true)
141-
# ... model, optimizer and loss definitions
141+
# ... model, optimiser and loss definitions
142142
for epoch in 1:nepochs
143143
for (xtrain_batch, ytrain_batch) in train_loader
144144
x, y = gpu(xtrain_batch), gpu(ytrain_batch)
145145
gradients = gradient(() -> loss(x, y), parameters)
146-
Flux.Optimise.update!(optimizer, parameters, gradients)
146+
Flux.Optimise.update!(optimiser, parameters, gradients)
147147
end
148148
end
149149
```
@@ -166,7 +166,7 @@ In order to train the model using the GPU both model and the training data have
166166
```julia
167167
using CUDA: CuIterator
168168
train_loader = Flux.DataLoader((xtrain, ytrain), batchsize = 64, shuffle = true)
169-
# ... model, optimizer and loss definitions
169+
# ... model, optimiser and loss definitions
170170
for epoch in 1:nepochs
171171
for (xtrain_batch, ytrain_batch) in CuIterator(train_loader)
172172
# ...

docs/src/models/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ julia> predict.bias
116116

117117
The dimensions of these model parameters depend on the number of inputs and outputs.
118118

119-
Flux will adjust predictions by iteratively changing these parameters according to the optimizer.
119+
Flux will adjust predictions by iteratively changing these parameters according to the optimiser.
120120

121121
This optimiser implements the classic gradient descent strategy. Now improve the parameters of the model with a call to [`Flux.train!`](@ref) like this:
122122

@@ -178,7 +178,7 @@ First, we gathered real-world data into the variables `x_train`, `y_train`, `x_t
178178

179179
Then, we built a single input, single output predictive model, `predict = Dense(1 => 1)`. The initial predictions weren't accurate, because we had not trained the model yet.
180180

181-
After building the model, we trained it with `train!(loss, predict, data, opt)`. The loss function is first, followed by the model itself, the training data, and the `Descent` optimizer provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the `train!` many times to finish the training process.
181+
After building the model, we trained it with `train!(loss, predict, data, opt)`. The loss function is first, followed by the model itself, the training data, and the `Descent` optimiser provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the `train!` many times to finish the training process.
182182

183183
After we trained the model, we verified it with the test data to verify the results.
184184

docs/src/saving.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,10 +129,10 @@ revert to an older copy of the model if it starts to overfit.
129129
@save "model-$(now()).bson" model loss = testloss()
130130
```
131131

132-
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.
132+
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimisers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.
133133

134134
You can store the optimiser state alongside the model, to resume training
135-
exactly where you left off. BSON is smart enough to [cache values](https://github.com/JuliaIO/BSON.jl/blob/v0.3.4/src/write.jl#L71) and insert links when saving, but only if it knows everything to be saved up front. Thus models and optimizers must be saved together to have the latter work after restoring.
135+
exactly where you left off. BSON is smart enough to [cache values](https://github.com/JuliaIO/BSON.jl/blob/v0.3.4/src/write.jl#L71) and insert links when saving, but only if it knows everything to be saved up front. Thus models and optimisers must be saved together to have the latter work after restoring.
136136

137137
```julia
138138
opt = Adam()

docs/src/training/optimisers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Flux.Optimise.Optimiser
7676

7777
## Scheduling Optimisers
7878

79-
In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](https://darsnack.github.io/ParameterSchedulers.jl/dev/README.html). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimizers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
79+
In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](https://darsnack.github.io/ParameterSchedulers.jl/dev/README.html). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimisers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
8080

8181
First, we import ParameterSchedulers.jl and initialize a cosine annealing schedule to vary the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
8282
```julia

docs/src/tutorials/2021-02-07-convnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ function train(; kws...)
145145
return logitcrossentropy(ŷ, y)
146146
end
147147

148-
# Train our model with the given training set using the Adam optimizer and
148+
# Train our model with the given training set using the Adam optimiser and
149149
# printing out performance against the test set as we go.
150150
opt = Adam(args.lr)
151151

docs/src/tutorials/2021-10-08-dcgan-mnist.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ The generator's loss quantifies how well it was able to trick the discriminator.
206206
generator_loss(fake_output) = logitbinarycrossentropy(fake_output, 1)
207207
```
208208

209-
We also need optimizers for our network. Why you may ask? Read more [here](https://towardsdatascience.com/overview-of-various-optimizers-in-neural-networks-17c1be2df6d5). For both the generator and discriminator, we will use the [ADAM optimizer](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.ADAM).
209+
We also need optimisers for our network. Why you may ask? Read more [here](https://towardsdatascience.com/overview-of-various-optimisers-in-neural-networks-17c1be2df6d5). For both the generator and discriminator, we will use the [ADAM optimiser](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.ADAM).
210210

211211
## Utility functions
212212

@@ -253,7 +253,7 @@ function train_generator!(gen, disc, fake_img, opt, ps, hparams)
253253
end
254254
```
255255

256-
Now that we have defined every function we need, we integrate everything into a single `train` function where we first set up all the models and optimizers and then train the GAN for a specified number of epochs.
256+
Now that we have defined every function we need, we integrate everything into a single `train` function where we first set up all the models and optimisers and then train the GAN for a specified number of epochs.
257257

258258
```julia
259259
function train(hparams)
@@ -278,7 +278,7 @@ function train(hparams)
278278
disc_ps = params(disc)
279279
gen_ps = params(gen)
280280

281-
# Initialize the ADAM optimizers for both the sub-models
281+
# Initialize the ADAM optimisers for both the sub-models
282282
# with respective learning rates
283283
disc_opt = ADAM(hparams.disc_lr)
284284
gen_opt = ADAM(hparams.gen_lr)

docs/src/tutorials/2021-10-14-vanilla-gan.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ at plots in a separate window, use fantastic for debugging.
3838

3939

4040
Next, let us define values for learning rate, batch size, epochs, and other
41-
hyper-parameters. While we are at it, we also define optimizers for the generator
41+
hyper-parameters. While we are at it, we also define optimisers for the generator
4242
and discriminator network. More on what these are later.
4343

4444
```julia
@@ -49,8 +49,8 @@ and discriminator network. More on what these are later.
4949
output_period = 100 # Period length for plots of generator samples
5050
n_features = 28 * 28# Number of pixels in each sample of the MNIST dataset
5151
latent_dim = 100 # Dimension of latent space
52-
opt_dscr = ADAM(lr_d)# Optimizer for the discriminator
53-
opt_gen = ADAM(lr_g) # Optimizer for the generator
52+
opt_dscr = ADAM(lr_d)# Optimiser for the discriminator
53+
opt_gen = ADAM(lr_g) # Optimiser for the generator
5454
```
5555

5656

src/optimise/optimisers.jl

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ end
4545
"""
4646
Momentum(η = 0.01, ρ = 0.9)
4747
48-
Gradient descent optimizer with learning rate `η` and momentum `ρ`.
48+
Gradient descent optimiser with learning rate `η` and momentum `ρ`.
4949
5050
# Parameters
5151
- Learning rate (`η`): Amount by which gradients are discounted before updating
@@ -78,7 +78,7 @@ end
7878
"""
7979
Nesterov(η = 0.001, ρ = 0.9)
8080
81-
Gradient descent optimizer with learning rate `η` and Nesterov momentum `ρ`.
81+
Gradient descent optimiser with learning rate `η` and Nesterov momentum `ρ`.
8282
8383
# Parameters
8484
- Learning rate (`η`): Amount by which gradients are discounted before updating
@@ -191,7 +191,7 @@ end
191191
"""
192192
RAdam(η = 0.001, β::Tuple = (0.9, 0.999), ϵ = $EPS)
193193
194-
[Rectified Adam](https://arxiv.org/abs/1908.03265) optimizer.
194+
[Rectified Adam](https://arxiv.org/abs/1908.03265) optimiser.
195195
196196
# Parameters
197197
- Learning rate (`η`): Amount by which gradients are discounted before updating
@@ -328,7 +328,7 @@ end
328328
"""
329329
AdaGrad(η = 0.1, ϵ = $EPS)
330330
331-
[AdaGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) optimizer. It has
331+
[AdaGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) optimiser. It has
332332
parameter specific learning rates based on how frequently it is updated.
333333
Parameters don't need tuning.
334334
@@ -540,7 +540,7 @@ function apply!(o::AdaBelief, x, Δ)
540540
#= st is a variance and can go to zero. This is in contrast to Adam, which uses the
541541
second moment which is usually far enough from zero. This is problematic, since st
542542
can be slightly negative due to numerical error, and the square root below will fail.
543-
Also, if we want to differentiate through the optimizer, √0 is not differentiable.
543+
Also, if we want to differentiate through the optimiser, √0 is not differentiable.
544544
To protect against this, we add a small number, st -> st + eps2.
545545
The original implementation (https://github.com/juntang-zhuang/Adabelief-Optimizer)
546546
uses the square of Adam's epsilon, which we do here.
@@ -556,7 +556,7 @@ function apply!(o::AdaBelief, x, Δ)
556556
end
557557

558558

559-
# Compose optimizers
559+
# Compose optimisers
560560

561561
"""
562562
Optimiser(a, b, c...)
@@ -598,7 +598,7 @@ for more general scheduling techniques.
598598
599599
# Examples
600600
601-
`InvDecay` is typically composed with other optimizers
601+
`InvDecay` is typically composed with other optimisers
602602
as the last transformation of the gradient:
603603
604604
```julia
@@ -643,13 +643,13 @@ for more general scheduling techniques.
643643
644644
# Examples
645645
646-
`ExpDecay` is typically composed with other optimizers
646+
`ExpDecay` is typically composed with other optimisers
647647
as the last transformation of the gradient:
648648
```julia
649649
opt = Optimiser(Adam(), ExpDecay(1.0))
650650
```
651651
Note: you may want to start with `η=1` in `ExpDecay` when combined with other
652-
optimizers (`Adam` in this case) that have their own learning rate.
652+
optimisers (`Adam` in this case) that have their own learning rate.
653653
"""
654654
mutable struct ExpDecay <: AbstractOptimiser
655655
eta::Float64
@@ -677,7 +677,7 @@ end
677677
WeightDecay(λ = 0)
678678
679679
Decay weights by ``λ``.
680-
Typically composed with other optimizers as the first transformation to the gradient,
680+
Typically composed with other optimisers as the first transformation to the gradient,
681681
making it equivalent to adding ``L_2`` regularization
682682
with coefficient ``λ`` to the loss.
683683

0 commit comments

Comments
 (0)