|
2 | 2 |
|
3 | 3 | This package currently provides support for Linear Chain Conditional Random Fields. |
4 | 4 |
|
5 | | -Let us first load the dependencies- |
| 5 | +Let us first load the dependencies: |
6 | 6 |
|
7 | | - using Flux |
8 | | - using Flux: onehot, LSTM, Dense, reset! |
9 | | - using TextModels: CRF, viterbi_decode, crf_loss |
10 | | - |
11 | | -Conditional Random Field layer is essentially like a softmax that operates on the top most layer. |
| 7 | +```@example crf |
| 8 | +using Flux |
| 9 | +using Flux: onehot, LSTM, Dense, reset! |
| 10 | +using TextModels: CRF, viterbi_decode, crf_loss |
| 11 | +nothing # hide |
| 12 | +``` |
12 | 13 |
|
13 | | -Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2` |
| 14 | +Conditional Random Field layer is essentially like a softmax layer that operates on the top-most layer. |
14 | 15 |
|
15 | | -```julia |
16 | | -julia> NUM_LABELS = 2 |
17 | | -julia> SEQUENCE_LENGTH = 2 # CRFs can handle variable length inputs sequences |
18 | | -julia> input_seq = [Float32.(rand(NUM_LABELS + 2)) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label. |
19 | | -2-element Vector{Vector{Float32}}: |
20 | | - [0.5114323, 0.5355139, 0.4011792, 0.56359255] |
21 | | - [0.22925346, 0.21232551, 0.77616125, 0.41560093] |
| 16 | +Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2`: |
22 | 17 |
|
| 18 | +```@example crf |
| 19 | +using Random |
| 20 | +Random.seed!(42) # For reproducible documentation |
| 21 | +NUM_LABELS = 2 |
| 22 | +SEQUENCE_LENGTH = 3 # CRFs can handle variable length inputs sequences |
| 23 | +input_seq = [rand(NUM_LABELS + 2) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label. |
23 | 24 | ``` |
24 | 25 |
|
25 | | -We define our crf layer as - |
| 26 | +We define our CRF layer as: |
26 | 27 |
|
27 | 28 | CRF(NUM_LABELS::Integer) |
28 | 29 |
|
29 | | -```julia |
30 | | -julia> c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS. |
31 | | -CRF with 4 distinct tags (including START and STOP tags). |
| 30 | +```@example crf |
| 31 | +c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS. |
32 | 32 | ``` |
33 | 33 |
|
34 | | -Now as for the initial variable in Viterbi Decode or Forward Algorithm, |
35 | | -we define our input as |
| 34 | +Now for the initial variable in Viterbi Decode or Forward Algorithm, |
| 35 | +we define our input as: |
36 | 36 |
|
37 | | -```julia |
38 | | -julia> init_α = fill(-10000, (c.n + 2, 1)) |
39 | | -julia> init_α[c.n + 1] = 0 |
| 37 | +```@example crf |
| 38 | +init_α = fill(-10000, (c.n + 2, 1)) |
| 39 | +init_α[c.n + 1] = 0 |
| 40 | +init_α |
40 | 41 | ``` |
41 | 42 |
|
42 | 43 | Optionally this could be shifted to GPU by `init_α = gpu(init_α)`, |
43 | | -considering the input sequence to be CuArray in this case. |
44 | | -To shift a CRF `c` to gpu, one can use `c = gpu(c)`. |
| 44 | +considering the input sequence to be a CuArray in this case. |
| 45 | +To shift a CRF `c` to GPU, one can use `c = gpu(c)`. |
45 | 46 |
|
46 | | -To find out the crf loss, we use the following function - |
| 47 | +To find the CRF loss, we use the following function: |
47 | 48 |
|
48 | 49 | crf_loss(c::CRF, input_seq, label_sequence, init_α) |
49 | 50 |
|
50 | | -``` |
51 | | -julia> label_seq1 = [onehot(1, 1:2), onehot(1, 1:2)] |
52 | | -
|
53 | | -julia> label_seq2 = [onehot(1, 1:2), onehot(2, 1:2)] |
54 | | -
|
55 | | -julia> label_seq3 = [onehot(2, 1:2), onehot(1, 1:2)] |
| 51 | +```@example crf |
| 52 | +using Flux: onehot |
| 53 | +label_seq1 = [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)] |
| 54 | +label_seq2 = [onehot(1, 1:2), onehot(1, 1:2), onehot(2, 1:2)] |
| 55 | +label_seq3 = [onehot(2, 1:2), onehot(1, 1:2), onehot(1, 1:2)] |
| 56 | +label_seq4 = [onehot(2, 1:2), onehot(2, 1:2), onehot(2, 1:2)] |
56 | 57 |
|
57 | | -julia> label_seq4 = [onehot(2, 1:2), onehot(2, 1:2)] |
58 | | -
|
59 | | -julia> crf_loss(c, input_seq, label_seq1, init_α) |
60 | | -1.33554f0 |
61 | | -
|
62 | | -julia> crf_loss(c, input_seq, label_seq2, init_α) |
63 | | -1.2327178f0 |
| 58 | +crf_loss(c, input_seq, label_seq1, init_α) |
| 59 | +``` |
64 | 60 |
|
65 | | -julia> crf_loss(c, input_seq, label_seq3, init_α) |
66 | | -1.3454239f0 |
| 61 | +```@example crf |
| 62 | +crf_loss(c, input_seq, label_seq2, init_α) |
| 63 | +``` |
67 | 64 |
|
68 | | -julia> crf_loss(c, input_seq, label_seq4, init_α) |
69 | | -1.6871009f0 |
| 65 | +```@example crf |
| 66 | +crf_loss(c, input_seq, label_seq3, init_α) |
| 67 | +``` |
70 | 68 |
|
| 69 | +```@example crf |
| 70 | +crf_loss(c, input_seq, label_seq4, init_α) |
71 | 71 | ``` |
72 | 72 |
|
73 | | -We can decode this using Viterbi Decode. |
| 73 | +We can decode this using Viterbi Decode: |
74 | 74 |
|
75 | 75 | viterbi_decode(c::CRF, input_seq, init_α) |
76 | 76 |
|
77 | | -```julia |
78 | | -julia> viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss |
79 | | -2-element Vector{Flux.OneHotArray{UInt32, 2, 0, 1, UInt32}}: |
80 | | - [1, 0] |
81 | | - [0, 1] |
82 | | - |
| 77 | +```@example crf |
| 78 | +viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss |
83 | 79 | ``` |
84 | 80 |
|
85 | | -This algorithm decodes for the label sequence with lowest loss value in polynomial time. |
| 81 | +This algorithm decodes the label sequence with the lowest loss value in polynomial time. |
86 | 82 |
|
87 | | -Currently the Viterbi Decode only support cpu arrays. |
88 | | -When working with GPU, use viterbi_decode as follows |
| 83 | +Currently the Viterbi Decode only supports CPU arrays. |
| 84 | +When working with GPU, use viterbi_decode as follows: |
89 | 85 |
|
90 | 86 | viterbi_decode(cpu(c), cpu.(input_seq), cpu(init_α)) |
91 | 87 |
|
92 | 88 | ### Working with Flux layers |
93 | 89 |
|
94 | | -CRFs smoothly work over Flux layers- |
95 | | - |
96 | | -```julia |
97 | | -julia> NUM_FEATURES = 20 |
98 | | - |
99 | | -julia> input_seq = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH] |
100 | | -2-element Vector{Vector{Float32}}: |
101 | | - [0.948219, 0.719964, 0.352734, 0.0677656, 0.570564, 0.187673, 0.525125, 0.787807, 0.262452, 0.472472, 0.573259, 0.643369, 0.00592054, 0.945258, 0.951466, 0.323156, 0.679573, 0.663285, 0.218595, 0.152846] |
102 | | - [0.433295, 0.11998, 0.99615, 0.530107, 0.188887, 0.897213, 0.993726, 0.0799431, 0.953333, 0.941808, 0.982638, 0.0919345, 0.27504, 0.894169, 0.66818, 0.449537, 0.93063, 0.384957, 0.415114, 0.212203] |
103 | | - |
104 | | -julia> m1 = Dense(NUM_FEATURES, NUM_LABELS + 2) |
105 | | - |
106 | | -julia> loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1 |
| 90 | +CRFs work smoothly with Flux layers: |
107 | 91 |
|
108 | | -julia> loss1(input_seq, [onehot(1, 1:2), onehot(1, 1:2)]) |
109 | | -4.6620379898687485 |
| 92 | +```@example crf |
| 93 | +using Flux: Dense |
| 94 | +NUM_FEATURES = 20 |
110 | 95 |
|
| 96 | +# For working with Dense layers, we can use 1D vectors |
| 97 | +input_seq_dense = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH] |
111 | 98 | ``` |
112 | 99 |
|
| 100 | +```@example crf |
| 101 | +m1 = Dense(NUM_FEATURES, NUM_LABELS + 2) |
| 102 | +loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1 |
| 103 | +loss1(input_seq_dense, [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)]) |
| 104 | +``` |
113 | 105 |
|
114 | | -Here is an example of CRF with LSTM and Dense layer - |
115 | | - |
116 | | -```julia |
117 | | -julia> LSTM_SIZE = 10 |
118 | | - |
119 | | -julia> lstm = LSTM(NUM_FEATURES, LSTM_SIZE) |
120 | 106 |
|
121 | | -julia> dense_out = Dense(LSTM_SIZE, NUM_LABELS + 2) |
| 107 | +Here is an example of CRF with recurrent neural network layers: |
122 | 108 |
|
123 | | -julia> m2(x) = dense_out.(lstm.(x)) |
| 109 | +```@example crf |
| 110 | +# For recurrent layers, we need 2D input matrices (features × sequence_position) |
| 111 | +# Let's create properly formatted 2D data |
| 112 | +input_2d = [Float32.(rand(2, 1)) for i in 1:SEQUENCE_LENGTH] # 2 features, 1 time step each |
| 113 | +input_2d |
| 114 | +``` |
124 | 115 |
|
125 | | -julia> loss2(input_seq, label_seq) = crf_loss(c, m2(input_seq), label_seq, init_α) # loss for model m2 |
| 116 | +```@example crf |
| 117 | +using Flux: RNN |
| 118 | +# Create a simple RNN model that works with 2D input |
| 119 | +rnn_model = RNN(2 => 5) # 2 input features → 5 hidden units |
| 120 | +dense_layer = Dense(5, NUM_LABELS + 2) # 5 hidden → 4 output (NUM_LABELS + 2) |
126 | 121 |
|
127 | | -julia> loss2(input_seq, [onehot(1, 1:2), onehot(1, 1:2)]) |
128 | | -1.6501050910529504 |
| 122 | +# Forward pass through RNN then Dense layer |
| 123 | +rnn_outputs = rnn_model.(input_2d) |
| 124 | +final_outputs = dense_layer.(rnn_outputs) |
129 | 125 |
|
130 | | -julia> reset!(lstm) |
| 126 | +# Now we can use this with CRF |
| 127 | +loss_rnn(input_2d, label_seq) = crf_loss(c, dense_layer.(rnn_model.(input_2d)), label_seq, init_α) |
| 128 | +loss_rnn(input_2d, [onehot(1, 1:2), onehot(2, 1:2), onehot(1, 1:2)]) |
131 | 129 | ``` |
0 commit comments