Merge pull request #1916 from Saransh-cpp/add-doctests-to-md-files

CarloLucibello · web-flow · commit 12bad508d953 · 2022-05-07T09:21:57.000+02:00
Add a ton of doctests + fix outdated documentation in `.md` files
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -64,7 +64,7 @@ jobs:
             Pkg.develop(PackageSpec(path=pwd()))
             Pkg.instantiate()'
       - run: |
-          julia --project=docs/ -e '
+          julia --color=yes --project=docs/ -e '
             using Flux
             # using Pkg; Pkg.activate("docs")
             using Documenter
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,8 +1,9 @@
 [deps]
+BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
-NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
 MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
+NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
 
 [compat]
 Documenter = "0.26"
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,7 +1,7 @@
-using Documenter, Flux, NNlib, Functors, MLUtils
+using Documenter, Flux, NNlib, Functors, MLUtils, BSON
 
 DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive = true)
-makedocs(modules = [Flux, NNlib, Functors, MLUtils],
+makedocs(modules = [Flux, NNlib, Functors, MLUtils, BSON],
          doctest = false,
          sitename = "Flux",
          pages = ["Home" => "index.md",
diff --git a/docs/src/gpu.md b/docs/src/gpu.md
@@ -55,35 +55,35 @@ As a convenience, Flux provides the `gpu` function to convert models and data to
 julia> using Flux, CUDA
 
 julia> m = Dense(10, 5) |> gpu
-Dense(10 => 5)
+Dense(10 => 5)      # 55 parameters
 
 julia> x = rand(10) |> gpu
-10-element CuArray{Float32,1}:
- 0.800225
+10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
+ 0.066846445
  ⋮
- 0.511655
+ 0.76706964
 
 julia> m(x)
-5-element CuArray{Float32,1}:
- -0.30535
+5-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
+ -0.99992573
  ⋮
- -0.618002
+ -0.547261
 ```
 
 The analogue `cpu` is also available for moving models and data back off of the GPU.
 
 ```julia
 julia> x = rand(10) |> gpu
-10-element CuArray{Float32,1}:
- 0.235164
+10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
+ 0.8019236
  ⋮
- 0.192538
+ 0.7766742
 
 julia> x |> cpu
-10-element Array{Float32,1}:
- 0.235164
+10-element Vector{Float32}:
+ 0.8019236
  ⋮
- 0.192538
+ 0.7766742
 ```
 
 ## Disabling CUDA or choosing which GPUs are visible to Flux
diff --git a/docs/src/models/overview.md b/docs/src/models/overview.md
@@ -15,7 +15,7 @@ Here's how you'd use Flux to build and train the most basic of models, step by s
 
 This example will predict the output of the function `4x + 2`. First, import `Flux` and define the function we want to simulate:
 
-```julia
+```jldoctest overview
 julia> using Flux
 
 julia> actual(x) = 4x + 2
@@ -28,7 +28,7 @@ This example will build a model to approximate the `actual` function.
 
 Use the `actual` function to build sets of data for training and verification:
 
-```julia
+```jldoctest overview
 julia> x_train, x_test = hcat(0:5...), hcat(6:10...)
 ([0 1 … 4 5], [6 7 … 9 10])
 
@@ -42,13 +42,13 @@ Normally, your training and test data come from real world observations, but thi
 
 Now, build a model to make predictions with `1` input and `1` output:
 
-```julia
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> model = Dense(1 => 1)
-Dense(1 => 1)
+Dense(1 => 1)       # 2 parameters
 
 julia> model.weight
 1×1 Matrix{Float32}:
- -1.4925033
+ 0.95041317
 
 julia> model.bias
 1-element Vector{Float32}:
@@ -57,28 +57,28 @@ julia> model.bias
 
 Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*: 
 
-```julia
+```jldoctest overview
 julia> predict = Dense(1 => 1)
+Dense(1 => 1)       # 2 parameters
 ```
 
 `Dense(1 => 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs. 
 
 This model will already make predictions, though not accurate ones yet:
 
-```julia
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> predict(x_train)
 1×6 Matrix{Float32}:
- 0.0  -1.4925  -2.98501  -4.47751  -5.97001  -7.46252
+ 0.0  0.906654  1.81331  2.71996  3.62662  4.53327
 ```
 
 In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions. 
 
-```julia
-julia> loss(x, y) = Flux.Losses.mse(predict(x), y)
-loss (generic function with 1 method)
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+julia> loss(x, y) = Flux.Losses.mse(predict(x), y);
 
 julia> loss(x_train, y_train)
-282.16010605766024
+122.64734f0
 ```
 
 More accurate predictions will yield a lower loss. You can write your own loss functions or rely on those already provided by Flux. This loss function is called [mean squared error](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-squared-error/). Flux works by iteratively reducing the loss through *training*.
@@ -87,39 +87,39 @@ More accurate predictions will yield a lower loss. You can write your own loss f
 
 Under the hood, the Flux [`Flux.train!`](@ref) function uses *a loss function* and *training data* to improve the *parameters* of your model based on a pluggable [`optimiser`](../training/optimisers.md):
 
-```julia
+```jldoctest overview
 julia> using Flux: train!
 
 julia> opt = Descent()
 Descent(0.1)
 
 julia> data = [(x_train, y_train)]
-1-element Array{Tuple{Array{Int64,2},Array{Int64,2}},1}:
+1-element Vector{Tuple{Matrix{Int64}, Matrix{Int64}}}:
  ([0 1 … 4 5], [2 6 … 18 22])
 ```
 
 Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs: 
 
-```julia
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> predict.weight
-1-element Array{Float64,1}:
- -0.99009055
+1×1 Matrix{Float32}:
+ 0.9066542
 
 julia> predict.bias
-1-element Array{Float64,1}:
+1-element Vector{Float32}:
  0.0
 ```
 
 The dimensions of these model parameters depend on the number of inputs and outputs. Since models can have hundreds of inputs and several layers, it helps to have a function to collect the parameters into the data structure Flux expects:
 
-```
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> parameters = Flux.params(predict)
-Params([[-0.99009055], [0.0]])
+Params([Float32[0.9066542], Float32[0.0]])
 ```
 
 These are the parameters Flux will change, one step at a time, to improve predictions. At each step, the contents of this `Params` object changes too, since it is just a collection of references to the mutable arrays inside the model: 
 
-```
+```jldoctest overview
 julia> predict.weight in parameters, predict.bias in parameters
 (true, true)
 
@@ -129,22 +129,22 @@ The first parameter is the weight and the second is the bias. Flux will adjust p
 
 This optimiser implements the classic gradient descent strategy. Now improve the parameters of the model with a call to [`Flux.train!`](@ref) like this:
 
-```
+```jldoctest overview
 julia> train!(loss, parameters, data, opt)
 ```
 
 And check the loss:
 
-```
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> loss(x_train, y_train)
-267.8037f0
+116.38745f0
 ```
 
 It went down. Why? 
 
-```
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> parameters
-Params([[9.158408791666668], [2.895045275]])
+Params([Float32[7.5777884], Float32[1.9466728]])
 ```
 
 The parameters have changed. This single step is the essence of machine learning.
@@ -153,16 +153,16 @@ The parameters have changed. This single step is the essence of machine learning
 
 In the previous section, we made a single call to `train!` which iterates over the data we passed in just once. An *epoch* refers to one pass over the dataset. Typically, we will run the training for multiple epochs to drive the loss down even further. Let's run it a few more times:
 
-```
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> for epoch in 1:200
          train!(loss, parameters, data, opt)
        end
 
 julia> loss(x_train, y_train)
-0.007433314787010791
+0.00339581f0
 
 julia> parameters
-Params([[3.9735880692372345], [1.9925541368157165]])
+Params([Float32[4.0178537], Float32[2.0050256]])
 ```
 
 After 200 training steps, the loss went down, and the parameters are getting close to those in the function the model is built to predict.
@@ -171,13 +171,13 @@ After 200 training steps, the loss went down, and the parameters are getting clo
 
 Now, let's verify the predictions:
 
-```
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> predict(x_test)
-1×5 Array{Float64,2}:
- 25.8442  29.8194  33.7946  37.7698  41.745
+1×5 Matrix{Float32}:
+ 26.1121  30.13  34.1479  38.1657  42.1836
 
 julia> y_test
-1×5 Array{Int64,2}:
+1×5 Matrix{Int64}:
  26  30  34  38  42
 ```
 
diff --git a/docs/src/models/recurrence.md b/docs/src/models/recurrence.md
@@ -64,7 +64,9 @@ The `Recur` wrapper stores the state between runs in the `m.state` field.
 
 If we use the `RNN(2, 5)` constructor – as opposed to `RNNCell` – you'll see that it's simply a wrapped cell.
 
-```julia
+```jldoctest recurrence
+julia> using Flux
+
 julia> RNN(2, 5)  # or equivalently RNN(2 => 5)
 Recur(
   RNNCell(2 => 5, tanh),                # 45 parameters
@@ -76,21 +78,28 @@ Equivalent to the `RNN` stateful constructor, `LSTM` and `GRU` are also availabl
 
 Using these tools, we can now build the model shown in the above diagram with: 
 
-```julia
-m = Chain(RNN(2 => 5), Dense(5 => 1))
+```jldoctest recurrence
+julia> m = Chain(RNN(2 => 5), Dense(5 => 1))
+Chain(
+  Recur(
+    RNNCell(2 => 5, tanh),              # 45 parameters
+  ),
+  Dense(5 => 1),                        # 6 parameters
+)         # Total: 6 trainable arrays, 51 parameters,
+          # plus 1 non-trainable, 5 parameters, summarysize 580 bytes.   
 ```
 In this example, each output has only one component.
 
 ## Working with sequences
 
 Using the previously defined `m` recurrent model, we can now apply it to a single step from our sequence:
 
-```julia
+```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> x = rand(Float32, 2);
 
 julia> m(x)
 1-element Vector{Float32}:
- 0.31759313
+ 0.45860028
 ```
 
 The `m(x)` operation would be represented by `x1 -> A -> y1` in our diagram.
@@ -102,14 +111,14 @@ iterating the model on a sequence of data.
 
 To do so, we'll need to structure the input data as a `Vector` of observations at each time step. This `Vector` will therefore be of `length = seq_length` and each of its elements will represent the input features for a given step. In our example, this translates into a `Vector` of length 3, where each element is a `Matrix` of size `(features, batch_size)`, or just a `Vector` of length `features` if dealing with a single observation.  
 
-```julia
+```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> x = [rand(Float32, 2) for i = 1:3];
 
 julia> [m(xi) for xi in x]
 3-element Vector{Vector{Float32}}:
- [-0.033448644]
- [0.5259023]
- [-0.11183384]
+ [0.36080405]
+ [-0.13914406]
+ [0.9310162]
 ```
 
 !!! warning "Use of map and broadcast"
diff --git a/docs/src/models/regularisation.md b/docs/src/models/regularisation.md
diff --git a/docs/src/saving.md b/docs/src/saving.md
diff --git a/docs/src/training/training.md b/docs/src/training/training.md