Skip to content

Commit 12bad50

Browse files
Merge pull request #1916 from Saransh-cpp/add-doctests-to-md-files
Add a ton of doctests + fix outdated documentation in `.md` files
2 parents 946b815 + 2e24d96 commit 12bad50

File tree

9 files changed

+128
-103
lines changed

9 files changed

+128
-103
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ jobs:
6464
Pkg.develop(PackageSpec(path=pwd()))
6565
Pkg.instantiate()'
6666
- run: |
67-
julia --project=docs/ -e '
67+
julia --color=yes --project=docs/ -e '
6868
using Flux
6969
# using Pkg; Pkg.activate("docs")
7070
using Documenter

docs/Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
[deps]
2+
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
23
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
34
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
4-
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
55
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
6+
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
67

78
[compat]
89
Documenter = "0.26"

docs/make.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
using Documenter, Flux, NNlib, Functors, MLUtils
1+
using Documenter, Flux, NNlib, Functors, MLUtils, BSON
22

33
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive = true)
4-
makedocs(modules = [Flux, NNlib, Functors, MLUtils],
4+
makedocs(modules = [Flux, NNlib, Functors, MLUtils, BSON],
55
doctest = false,
66
sitename = "Flux",
77
pages = ["Home" => "index.md",

docs/src/gpu.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -55,35 +55,35 @@ As a convenience, Flux provides the `gpu` function to convert models and data to
5555
julia> using Flux, CUDA
5656

5757
julia> m = Dense(10, 5) |> gpu
58-
Dense(10 => 5)
58+
Dense(10 => 5) # 55 parameters
5959

6060
julia> x = rand(10) |> gpu
61-
10-element CuArray{Float32,1}:
62-
0.800225
61+
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
62+
0.066846445
6363
64-
0.511655
64+
0.76706964
6565

6666
julia> m(x)
67-
5-element CuArray{Float32,1}:
68-
-0.30535
67+
5-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
68+
-0.99992573
6969
70-
-0.618002
70+
-0.547261
7171
```
7272

7373
The analogue `cpu` is also available for moving models and data back off of the GPU.
7474

7575
```julia
7676
julia> x = rand(10) |> gpu
77-
10-element CuArray{Float32,1}:
78-
0.235164
77+
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
78+
0.8019236
7979
80-
0.192538
80+
0.7766742
8181

8282
julia> x |> cpu
83-
10-element Array{Float32,1}:
84-
0.235164
83+
10-element Vector{Float32}:
84+
0.8019236
8585
86-
0.192538
86+
0.7766742
8787
```
8888

8989
## Disabling CUDA or choosing which GPUs are visible to Flux

docs/src/models/overview.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Here's how you'd use Flux to build and train the most basic of models, step by s
1515

1616
This example will predict the output of the function `4x + 2`. First, import `Flux` and define the function we want to simulate:
1717

18-
```julia
18+
```jldoctest overview
1919
julia> using Flux
2020
2121
julia> actual(x) = 4x + 2
@@ -28,7 +28,7 @@ This example will build a model to approximate the `actual` function.
2828

2929
Use the `actual` function to build sets of data for training and verification:
3030

31-
```julia
31+
```jldoctest overview
3232
julia> x_train, x_test = hcat(0:5...), hcat(6:10...)
3333
([0 1 … 4 5], [6 7 … 9 10])
3434
@@ -42,13 +42,13 @@ Normally, your training and test data come from real world observations, but thi
4242

4343
Now, build a model to make predictions with `1` input and `1` output:
4444

45-
```julia
45+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
4646
julia> model = Dense(1 => 1)
47-
Dense(1 => 1)
47+
Dense(1 => 1) # 2 parameters
4848
4949
julia> model.weight
5050
1×1 Matrix{Float32}:
51-
-1.4925033
51+
0.95041317
5252
5353
julia> model.bias
5454
1-element Vector{Float32}:
@@ -57,28 +57,28 @@ julia> model.bias
5757

5858
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
5959

60-
```julia
60+
```jldoctest overview
6161
julia> predict = Dense(1 => 1)
62+
Dense(1 => 1) # 2 parameters
6263
```
6364

6465
`Dense(1 => 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
6566

6667
This model will already make predictions, though not accurate ones yet:
6768

68-
```julia
69+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
6970
julia> predict(x_train)
7071
1×6 Matrix{Float32}:
71-
0.0 -1.4925 -2.98501 -4.47751 -5.97001 -7.46252
72+
0.0 0.906654 1.81331 2.71996 3.62662 4.53327
7273
```
7374

7475
In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions.
7576

76-
```julia
77-
julia> loss(x, y) = Flux.Losses.mse(predict(x), y)
78-
loss (generic function with 1 method)
77+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
78+
julia> loss(x, y) = Flux.Losses.mse(predict(x), y);
7979
8080
julia> loss(x_train, y_train)
81-
282.16010605766024
81+
122.64734f0
8282
```
8383

8484
More accurate predictions will yield a lower loss. You can write your own loss functions or rely on those already provided by Flux. This loss function is called [mean squared error](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-squared-error/). Flux works by iteratively reducing the loss through *training*.
@@ -87,39 +87,39 @@ More accurate predictions will yield a lower loss. You can write your own loss f
8787

8888
Under the hood, the Flux [`Flux.train!`](@ref) function uses *a loss function* and *training data* to improve the *parameters* of your model based on a pluggable [`optimiser`](../training/optimisers.md):
8989

90-
```julia
90+
```jldoctest overview
9191
julia> using Flux: train!
9292
9393
julia> opt = Descent()
9494
Descent(0.1)
9595
9696
julia> data = [(x_train, y_train)]
97-
1-element Array{Tuple{Array{Int64,2},Array{Int64,2}},1}:
97+
1-element Vector{Tuple{Matrix{Int64}, Matrix{Int64}}}:
9898
([0 1 … 4 5], [2 6 … 18 22])
9999
```
100100

101101
Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs:
102102

103-
```julia
103+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
104104
julia> predict.weight
105-
1-element Array{Float64,1}:
106-
-0.99009055
105+
1×1 Matrix{Float32}:
106+
0.9066542
107107
108108
julia> predict.bias
109-
1-element Array{Float64,1}:
109+
1-element Vector{Float32}:
110110
0.0
111111
```
112112

113113
The dimensions of these model parameters depend on the number of inputs and outputs. Since models can have hundreds of inputs and several layers, it helps to have a function to collect the parameters into the data structure Flux expects:
114114

115-
```
115+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
116116
julia> parameters = Flux.params(predict)
117-
Params([[-0.99009055], [0.0]])
117+
Params([Float32[0.9066542], Float32[0.0]])
118118
```
119119

120120
These are the parameters Flux will change, one step at a time, to improve predictions. At each step, the contents of this `Params` object changes too, since it is just a collection of references to the mutable arrays inside the model:
121121

122-
```
122+
```jldoctest overview
123123
julia> predict.weight in parameters, predict.bias in parameters
124124
(true, true)
125125
@@ -129,22 +129,22 @@ The first parameter is the weight and the second is the bias. Flux will adjust p
129129

130130
This optimiser implements the classic gradient descent strategy. Now improve the parameters of the model with a call to [`Flux.train!`](@ref) like this:
131131

132-
```
132+
```jldoctest overview
133133
julia> train!(loss, parameters, data, opt)
134134
```
135135

136136
And check the loss:
137137

138-
```
138+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
139139
julia> loss(x_train, y_train)
140-
267.8037f0
140+
116.38745f0
141141
```
142142

143143
It went down. Why?
144144

145-
```
145+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
146146
julia> parameters
147-
Params([[9.158408791666668], [2.895045275]])
147+
Params([Float32[7.5777884], Float32[1.9466728]])
148148
```
149149

150150
The parameters have changed. This single step is the essence of machine learning.
@@ -153,16 +153,16 @@ The parameters have changed. This single step is the essence of machine learning
153153

154154
In the previous section, we made a single call to `train!` which iterates over the data we passed in just once. An *epoch* refers to one pass over the dataset. Typically, we will run the training for multiple epochs to drive the loss down even further. Let's run it a few more times:
155155

156-
```
156+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
157157
julia> for epoch in 1:200
158158
train!(loss, parameters, data, opt)
159159
end
160160
161161
julia> loss(x_train, y_train)
162-
0.007433314787010791
162+
0.00339581f0
163163
164164
julia> parameters
165-
Params([[3.9735880692372345], [1.9925541368157165]])
165+
Params([Float32[4.0178537], Float32[2.0050256]])
166166
```
167167

168168
After 200 training steps, the loss went down, and the parameters are getting close to those in the function the model is built to predict.
@@ -171,13 +171,13 @@ After 200 training steps, the loss went down, and the parameters are getting clo
171171

172172
Now, let's verify the predictions:
173173

174-
```
174+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
175175
julia> predict(x_test)
176-
1×5 Array{Float64,2}:
177-
25.8442 29.8194 33.7946 37.7698 41.745
176+
1×5 Matrix{Float32}:
177+
26.1121 30.13 34.1479 38.1657 42.1836
178178
179179
julia> y_test
180-
1×5 Array{Int64,2}:
180+
1×5 Matrix{Int64}:
181181
26 30 34 38 42
182182
```
183183

docs/src/models/recurrence.md

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ The `Recur` wrapper stores the state between runs in the `m.state` field.
6464

6565
If we use the `RNN(2, 5)` constructor – as opposed to `RNNCell` – you'll see that it's simply a wrapped cell.
6666

67-
```julia
67+
```jldoctest recurrence
68+
julia> using Flux
69+
6870
julia> RNN(2, 5) # or equivalently RNN(2 => 5)
6971
Recur(
7072
RNNCell(2 => 5, tanh), # 45 parameters
@@ -76,21 +78,28 @@ Equivalent to the `RNN` stateful constructor, `LSTM` and `GRU` are also availabl
7678

7779
Using these tools, we can now build the model shown in the above diagram with:
7880

79-
```julia
80-
m = Chain(RNN(2 => 5), Dense(5 => 1))
81+
```jldoctest recurrence
82+
julia> m = Chain(RNN(2 => 5), Dense(5 => 1))
83+
Chain(
84+
Recur(
85+
RNNCell(2 => 5, tanh), # 45 parameters
86+
),
87+
Dense(5 => 1), # 6 parameters
88+
) # Total: 6 trainable arrays, 51 parameters,
89+
# plus 1 non-trainable, 5 parameters, summarysize 580 bytes.
8190
```
8291
In this example, each output has only one component.
8392

8493
## Working with sequences
8594

8695
Using the previously defined `m` recurrent model, we can now apply it to a single step from our sequence:
8796

88-
```julia
97+
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
8998
julia> x = rand(Float32, 2);
9099
91100
julia> m(x)
92101
1-element Vector{Float32}:
93-
0.31759313
102+
0.45860028
94103
```
95104

96105
The `m(x)` operation would be represented by `x1 -> A -> y1` in our diagram.
@@ -102,14 +111,14 @@ iterating the model on a sequence of data.
102111

103112
To do so, we'll need to structure the input data as a `Vector` of observations at each time step. This `Vector` will therefore be of `length = seq_length` and each of its elements will represent the input features for a given step. In our example, this translates into a `Vector` of length 3, where each element is a `Matrix` of size `(features, batch_size)`, or just a `Vector` of length `features` if dealing with a single observation.
104113

105-
```julia
114+
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
106115
julia> x = [rand(Float32, 2) for i = 1:3];
107116
108117
julia> [m(xi) for xi in x]
109118
3-element Vector{Vector{Float32}}:
110-
[-0.033448644]
111-
[0.5259023]
112-
[-0.11183384]
119+
[0.36080405]
120+
[-0.13914406]
121+
[0.9310162]
113122
```
114123

115124
!!! warning "Use of map and broadcast"

0 commit comments

Comments
 (0)