Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name = "TextModels"
uuid = "77b9cbda-2a23-51df-82a3-24144d1cd378"
license = "MIT"
desc = "Practical Neural Network based models for Natural Language Processing"
version = "0.2.0"
version = "0.2.1"

[deps]
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
Expand Down Expand Up @@ -32,7 +32,7 @@ DataDeps = "0.7"
DataStructures = "0.18, 0.19, 0.20"
Flux = "0.16, 0.17"
Functors = "0.4, 0.5, 0.6"
JSON = "0.21, 0.22"
JSON = "0.21, 1"
Languages = "0.4"
NNlib = "0.7, 0.8, 0.9, 0.10"
StatsBase = "0.33, 0.34, 0.35"
Expand Down
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
TextAnalysis = "a2db99b7-8b79-58f8-94bf-bbc811eef33d"
TextModels = "77b9cbda-2a23-51df-82a3-24144d1cd378"
WordTokenizers = "796a5d58-b03d-544a-977e-18100b691f6e"
7 changes: 7 additions & 0 deletions docs/src/APIReference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,10 @@
Modules = [TextModels, TextModels.ULMFiT]
Order = [:function, :type]
```

## Constructor Functions

```@docs
NERTagger
PoSTagger
```
157 changes: 64 additions & 93 deletions docs/src/ULMFiT.md

Large diffs are not rendered by default.

158 changes: 78 additions & 80 deletions docs/src/crf.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,130 +2,128 @@

This package currently provides support for Linear Chain Conditional Random Fields.

Let us first load the dependencies-
Let us first load the dependencies:

using Flux
using Flux: onehot, LSTM, Dense, reset!
using TextModels: CRF, viterbi_decode, crf_loss

Conditional Random Field layer is essentially like a softmax that operates on the top most layer.
```@example crf
using Flux
using Flux: onehot, LSTM, Dense, reset!
using TextModels: CRF, viterbi_decode, crf_loss
nothing # hide
```

Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2`
Conditional Random Field layer is essentially like a softmax layer that operates on the top-most layer.

```julia
julia> NUM_LABELS = 2
julia> SEQUENCE_LENGTH = 2 # CRFs can handle variable length inputs sequences
julia> input_seq = [Float32.(rand(NUM_LABELS + 2)) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label.
2-element Vector{Vector{Float32}}:
[0.5114323, 0.5355139, 0.4011792, 0.56359255]
[0.22925346, 0.21232551, 0.77616125, 0.41560093]
Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2`:

```@example crf
using Random
Random.seed!(42) # For reproducible documentation
NUM_LABELS = 2
SEQUENCE_LENGTH = 3 # CRFs can handle variable length inputs sequences
input_seq = [rand(NUM_LABELS + 2) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label.
```

We define our crf layer as -
We define our CRF layer as:

CRF(NUM_LABELS::Integer)

```julia
julia> c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS.
CRF with 4 distinct tags (including START and STOP tags).
```@example crf
c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS.
```

Now as for the initial variable in Viterbi Decode or Forward Algorithm,
we define our input as
Now for the initial variable in Viterbi Decode or Forward Algorithm,
we define our input as:

```julia
julia> init_α = fill(-10000, (c.n + 2, 1))
julia> init_α[c.n + 1] = 0
```@example crf
init_α = fill(-10000, (c.n + 2, 1))
init_α[c.n + 1] = 0
init_α
```

Optionally this could be shifted to GPU by `init_α = gpu(init_α)`,
considering the input sequence to be CuArray in this case.
To shift a CRF `c` to gpu, one can use `c = gpu(c)`.
considering the input sequence to be a CuArray in this case.
To shift a CRF `c` to GPU, one can use `c = gpu(c)`.

To find out the crf loss, we use the following function -
To find the CRF loss, we use the following function:

crf_loss(c::CRF, input_seq, label_sequence, init_α)

```
julia> label_seq1 = [onehot(1, 1:2), onehot(1, 1:2)]

julia> label_seq2 = [onehot(1, 1:2), onehot(2, 1:2)]

julia> label_seq3 = [onehot(2, 1:2), onehot(1, 1:2)]
```@example crf
using Flux: onehot
label_seq1 = [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)]
label_seq2 = [onehot(1, 1:2), onehot(1, 1:2), onehot(2, 1:2)]
label_seq3 = [onehot(2, 1:2), onehot(1, 1:2), onehot(1, 1:2)]
label_seq4 = [onehot(2, 1:2), onehot(2, 1:2), onehot(2, 1:2)]

julia> label_seq4 = [onehot(2, 1:2), onehot(2, 1:2)]

julia> crf_loss(c, input_seq, label_seq1, init_α)
1.33554f0

julia> crf_loss(c, input_seq, label_seq2, init_α)
1.2327178f0
crf_loss(c, input_seq, label_seq1, init_α)
```

julia> crf_loss(c, input_seq, label_seq3, init_α)
1.3454239f0
```@example crf
crf_loss(c, input_seq, label_seq2, init_α)
```

julia> crf_loss(c, input_seq, label_seq4, init_α)
1.6871009f0
```@example crf
crf_loss(c, input_seq, label_seq3, init_α)
```

```@example crf
crf_loss(c, input_seq, label_seq4, init_α)
```

We can decode this using Viterbi Decode.
We can decode this using Viterbi Decode:

viterbi_decode(c::CRF, input_seq, init_α)

```julia
julia> viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss
2-element Vector{Flux.OneHotArray{UInt32, 2, 0, 1, UInt32}}:
[1, 0]
[0, 1]

```@example crf
viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss
```

This algorithm decodes for the label sequence with lowest loss value in polynomial time.
This algorithm decodes the label sequence with the lowest loss value in polynomial time.

Currently the Viterbi Decode only support cpu arrays.
When working with GPU, use viterbi_decode as follows
Currently the Viterbi Decode only supports CPU arrays.
When working with GPU, use viterbi_decode as follows:

viterbi_decode(cpu(c), cpu.(input_seq), cpu(init_α))

### Working with Flux layers

CRFs smoothly work over Flux layers-

```julia
julia> NUM_FEATURES = 20

julia> input_seq = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH]
2-element Vector{Vector{Float32}}:
[0.948219, 0.719964, 0.352734, 0.0677656, 0.570564, 0.187673, 0.525125, 0.787807, 0.262452, 0.472472, 0.573259, 0.643369, 0.00592054, 0.945258, 0.951466, 0.323156, 0.679573, 0.663285, 0.218595, 0.152846]
[0.433295, 0.11998, 0.99615, 0.530107, 0.188887, 0.897213, 0.993726, 0.0799431, 0.953333, 0.941808, 0.982638, 0.0919345, 0.27504, 0.894169, 0.66818, 0.449537, 0.93063, 0.384957, 0.415114, 0.212203]

julia> m1 = Dense(NUM_FEATURES, NUM_LABELS + 2)

julia> loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1
CRFs work smoothly with Flux layers:

julia> loss1(input_seq, [onehot(1, 1:2), onehot(1, 1:2)])
4.6620379898687485
```@example crf
using Flux: Dense
NUM_FEATURES = 20

# For working with Dense layers, we can use 1D vectors
input_seq_dense = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH]
```

```@example crf
m1 = Dense(NUM_FEATURES, NUM_LABELS + 2)
loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1
loss1(input_seq_dense, [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)])
```

Here is an example of CRF with LSTM and Dense layer -

```julia
julia> LSTM_SIZE = 10

julia> lstm = LSTM(NUM_FEATURES, LSTM_SIZE)

julia> dense_out = Dense(LSTM_SIZE, NUM_LABELS + 2)
Here is an example of CRF with recurrent neural network layers:

julia> m2(x) = dense_out.(lstm.(x))
```@example crf
# For recurrent layers, we need 2D input matrices (features × sequence_position)
# Let's create properly formatted 2D data
input_2d = [Float32.(rand(2, 1)) for i in 1:SEQUENCE_LENGTH] # 2 features, 1 time step each
input_2d
```

julia> loss2(input_seq, label_seq) = crf_loss(c, m2(input_seq), label_seq, init_α) # loss for model m2
```@example crf
using Flux: RNN
# Create a simple RNN model that works with 2D input
rnn_model = RNN(2 => 5) # 2 input features → 5 hidden units
dense_layer = Dense(5, NUM_LABELS + 2) # 5 hidden → 4 output (NUM_LABELS + 2)

julia> loss2(input_seq, [onehot(1, 1:2), onehot(1, 1:2)])
1.6501050910529504
# Forward pass through RNN then Dense layer
rnn_outputs = rnn_model.(input_2d)
final_outputs = dense_layer.(rnn_outputs)

julia> reset!(lstm)
# Now we can use this with CRF
loss_rnn(input_2d, label_seq) = crf_loss(c, dense_layer.(rnn_model.(input_2d)), label_seq, init_α)
loss_rnn(input_2d, [onehot(1, 1:2), onehot(2, 1:2), onehot(1, 1:2)])
```
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The TextModels package enhances the TextAnalysis package with end-user focussed, practical natural language models, typically based on neural networks (in this case, [Flux](https://fluxml.ai/))

This package depends on the [TextAnalysis](https://github.com/JuliaText/TextAnalysis.jl) package, which contains basic algorithms to deal with textual documetns.
This package depends on the [TextAnalysis](https://github.com/JuliaText/TextAnalysis.jl) package, which contains basic algorithms to deal with textual documents.

## Installation

Expand Down
Loading