You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -42,13 +42,13 @@ Normally, your training and test data come from real world observations, but thi
42
42
43
43
Now, build a model to make predictions with `1` input and `1` output:
44
44
45
-
```julia
45
+
```jldoctest overview
46
46
julia> model = Dense(1 => 1)
47
-
Dense(1=>1)
47
+
Dense(1 => 1) # 2 parameters
48
48
49
49
julia> model.weight
50
50
1×1 Matrix{Float32}:
51
-
-1.4925033
51
+
0.95041317
52
52
53
53
julia> model.bias
54
54
1-element Vector{Float32}:
@@ -57,28 +57,29 @@ julia> model.bias
57
57
58
58
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
59
59
60
-
```julia
60
+
```jldoctest overview
61
61
julia> predict = Dense(1 => 1)
62
+
Dense(1 => 1) # 2 parameters
62
63
```
63
64
64
65
`Dense(1 => 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
65
66
66
67
This model will already make predictions, though not accurate ones yet:
67
68
68
-
```julia
69
+
```jldoctest overview
69
70
julia> predict(x_train)
70
71
1×6 Matrix{Float32}:
71
-
0.0-1.4925-2.98501-4.47751-5.97001-7.46252
72
+
0.0 0.906654 1.81331 2.71996 3.62662 4.53327
72
73
```
73
74
74
75
In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions.
More accurate predictions will yield a lower loss. You can write your own loss functions or rely on those already provided by Flux. This loss function is called [mean squared error](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-squared-error/). Flux works by iteratively reducing the loss through *training*.
@@ -87,39 +88,39 @@ More accurate predictions will yield a lower loss. You can write your own loss f
87
88
88
89
Under the hood, the Flux [`Flux.train!`](@ref) function uses *a loss function* and *training data* to improve the *parameters* of your model based on a pluggable [`optimiser`](../training/optimisers.md):
Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs:
102
103
103
-
```julia
104
+
```jldoctest overview
104
105
julia> predict.weight
105
-
1-element Array{Float64,1}:
106
-
-0.99009055
106
+
1×1 Matrix{Float32}:
107
+
0.9066542
107
108
108
109
julia> predict.bias
109
-
1-element Array{Float64,1}:
110
+
1-element Vector{Float32}:
110
111
0.0
111
112
```
112
113
113
114
The dimensions of these model parameters depend on the number of inputs and outputs. Since models can have hundreds of inputs and several layers, it helps to have a function to collect the parameters into the data structure Flux expects:
114
115
115
-
```
116
+
```jldoctest overview
116
117
julia> parameters = Flux.params(predict)
117
-
Params([[-0.99009055], [0.0]])
118
+
Params([Float32[0.9066542], Float32[0.0]])
118
119
```
119
120
120
121
These are the parameters Flux will change, one step at a time, to improve predictions. At each step, the contents of this `Params` object changes too, since it is just a collection of references to the mutable arrays inside the model:
121
122
122
-
```
123
+
```jldoctest overview
123
124
julia> predict.weight in parameters, predict.bias in parameters
124
125
(true, true)
125
126
@@ -129,22 +130,22 @@ The first parameter is the weight and the second is the bias. Flux will adjust p
129
130
130
131
This optimiser implements the classic gradient descent strategy. Now improve the parameters of the model with a call to [`Flux.train!`](@ref) like this:
131
132
132
-
```
133
+
```jldoctest overview
133
134
julia> train!(loss, parameters, data, opt)
134
135
```
135
136
136
137
And check the loss:
137
138
138
-
```
139
+
```jldoctest overview
139
140
julia> loss(x_train, y_train)
140
-
267.8037f0
141
+
116.38745f0
141
142
```
142
143
143
144
It went down. Why?
144
145
145
-
```
146
+
```jldoctest overview
146
147
julia> parameters
147
-
Params([[9.158408791666668], [2.895045275]])
148
+
Params([Float32[7.5777884], Float32[1.9466728]])
148
149
```
149
150
150
151
The parameters have changed. This single step is the essence of machine learning.
@@ -153,16 +154,16 @@ The parameters have changed. This single step is the essence of machine learning
153
154
154
155
In the previous section, we made a single call to `train!` which iterates over the data we passed in just once. An *epoch* refers to one pass over the dataset. Typically, we will run the training for multiple epochs to drive the loss down even further. Let's run it a few more times:
0 commit comments