JuliaText · rssdev10 · Oct 18, 2025 · Oct 18, 2025 · Oct 18, 2025 · Oct 18, 2025
diff --git a/Project.toml b/Project.toml
@@ -2,7 +2,7 @@ name = "TextModels"
 uuid = "77b9cbda-2a23-51df-82a3-24144d1cd378"
 license = "MIT"
 desc = "Practical Neural Network based models for Natural Language Processing"
-version = "0.2.0"
+version = "0.2.1"
 
 [deps]
 BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
@@ -32,7 +32,7 @@ DataDeps = "0.7"
 DataStructures = "0.18, 0.19, 0.20"
 Flux = "0.16, 0.17"
 Functors = "0.4, 0.5, 0.6"
-JSON = "0.21, 0.22"
+JSON = "0.21, 1"
 Languages = "0.4"
 NNlib = "0.7, 0.8, 0.9, 0.10"
 StatsBase = "0.33, 0.34, 0.35"

diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,4 +1,6 @@
 [deps]
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 TextAnalysis = "a2db99b7-8b79-58f8-94bf-bbc811eef33d"
 TextModels = "77b9cbda-2a23-51df-82a3-24144d1cd378"
+WordTokenizers = "796a5d58-b03d-544a-977e-18100b691f6e"
diff --git a/docs/src/APIReference.md b/docs/src/APIReference.md
@@ -4,3 +4,10 @@
 Modules = [TextModels, TextModels.ULMFiT]
 Order   = [:function, :type]
 ```
+
+## Constructor Functions
+
+```@docs
+NERTagger
+PoSTagger
+```
diff --git a/docs/src/ULMFiT.md b/docs/src/ULMFiT.md
diff --git a/docs/src/crf.md b/docs/src/crf.md
@@ -2,130 +2,128 @@
 
 This package currently provides support for Linear Chain Conditional Random Fields.
 
-Let us first load the dependencies-
+Let us first load the dependencies:
 
-    using Flux
-    using Flux: onehot, LSTM, Dense, reset!
-    using TextModels: CRF, viterbi_decode, crf_loss
-
-Conditional Random Field layer is essentially like a softmax that operates on the top most layer.
+```@example crf
+using Flux
+using Flux: onehot, LSTM, Dense, reset!
+using TextModels: CRF, viterbi_decode, crf_loss
+nothing # hide
+```
 
-Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2`
+Conditional Random Field layer is essentially like a softmax layer that operates on the top-most layer.
 
-```julia
-julia> NUM_LABELS = 2
-julia> SEQUENCE_LENGTH = 2 # CRFs can handle variable length inputs sequences
-julia> input_seq = [Float32.(rand(NUM_LABELS + 2)) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label.
-2-element Vector{Vector{Float32}}:
- [0.5114323, 0.5355139, 0.4011792, 0.56359255]
- [0.22925346, 0.21232551, 0.77616125, 0.41560093]
+Let us suppose the following input sequence to the CRF with `NUM_LABELS = 2`:
 
+```@example crf
+using Random
+Random.seed!(42) # For reproducible documentation
+NUM_LABELS = 2
+SEQUENCE_LENGTH = 3 # CRFs can handle variable length inputs sequences
+input_seq = [rand(NUM_LABELS + 2) for i in 1:SEQUENCE_LENGTH] # NUM_LABELS + 2, where two extra features correspond to the :START and :END label.
 ```
 
-We define our crf layer as -
+We define our CRF layer as:
 
     CRF(NUM_LABELS::Integer)
 
-```julia
-julia> c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS.
-CRF with 4 distinct tags (including START and STOP tags).
+```@example crf
+c = CRF(NUM_LABELS) # The API internally append the START and END tags to NUM_LABELS.
 ```
 
-Now as for the initial variable in Viterbi Decode or Forward Algorithm,
-we define our input as
+Now for the initial variable in Viterbi Decode or Forward Algorithm,
+we define our input as:
 
-```julia
-julia>  init_α = fill(-10000, (c.n + 2, 1))
-julia>  init_α[c.n + 1] = 0
+```@example crf
+init_α = fill(-10000, (c.n + 2, 1))
+init_α[c.n + 1] = 0
+init_α
 ```
 
 Optionally this could be shifted to GPU by `init_α = gpu(init_α)`,
-considering the input sequence to be CuArray in this case.
-To shift a CRF `c` to gpu, one can use `c = gpu(c)`.
+considering the input sequence to be a CuArray in this case.
+To shift a CRF `c` to GPU, one can use `c = gpu(c)`.
 
-To find out the crf loss, we use the following function -
+To find the CRF loss, we use the following function:
 
     crf_loss(c::CRF, input_seq, label_sequence, init_α)
 
-```
-julia> label_seq1 = [onehot(1, 1:2), onehot(1, 1:2)]
-
-julia> label_seq2 = [onehot(1, 1:2), onehot(2, 1:2)]
-
-julia> label_seq3 = [onehot(2, 1:2), onehot(1, 1:2)]
+```@example crf
+using Flux: onehot
+label_seq1 = [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)]
+label_seq2 = [onehot(1, 1:2), onehot(1, 1:2), onehot(2, 1:2)]
+label_seq3 = [onehot(2, 1:2), onehot(1, 1:2), onehot(1, 1:2)]
+label_seq4 = [onehot(2, 1:2), onehot(2, 1:2), onehot(2, 1:2)]
 
-julia> label_seq4 = [onehot(2, 1:2), onehot(2, 1:2)]
-
-julia> crf_loss(c, input_seq, label_seq1, init_α)
-1.33554f0
-
-julia> crf_loss(c, input_seq, label_seq2, init_α)
-1.2327178f0
+crf_loss(c, input_seq, label_seq1, init_α)
+```
 
-julia> crf_loss(c, input_seq, label_seq3, init_α)
-1.3454239f0
+```@example crf
+crf_loss(c, input_seq, label_seq2, init_α)
+```
 
-julia> crf_loss(c, input_seq, label_seq4, init_α)
-1.6871009f0
+```@example crf
+crf_loss(c, input_seq, label_seq3, init_α)
+```
 
+```@example crf
+crf_loss(c, input_seq, label_seq4, init_α)
 ```
 
-We can decode this using Viterbi Decode.
+We can decode this using Viterbi Decode:
 
     viterbi_decode(c::CRF, input_seq, init_α)
 
-```julia
-julia> viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss
-2-element Vector{Flux.OneHotArray{UInt32, 2, 0, 1, UInt32}}:
- [1, 0]
- [0, 1]
-
+```@example crf
+viterbi_decode(c, input_seq, init_α) # Gives the label_sequence with least loss
 ```
 
-This algorithm decodes for the label sequence with lowest loss value in polynomial time.
+This algorithm decodes the label sequence with the lowest loss value in polynomial time.
 
-Currently the Viterbi Decode only support cpu arrays.
-When working with GPU, use viterbi_decode as follows
+Currently the Viterbi Decode only supports CPU arrays.
+When working with GPU, use viterbi_decode as follows:
 
     viterbi_decode(cpu(c), cpu.(input_seq), cpu(init_α))
 
 ### Working with Flux layers
 
-CRFs smoothly work over Flux layers-
-
-```julia
-julia> NUM_FEATURES = 20
-
-julia> input_seq = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH]
-2-element Vector{Vector{Float32}}:
- [0.948219, 0.719964, 0.352734, 0.0677656, 0.570564, 0.187673, 0.525125, 0.787807, 0.262452, 0.472472, 0.573259, 0.643369, 0.00592054, 0.945258, 0.951466, 0.323156, 0.679573, 0.663285, 0.218595, 0.152846]
- [0.433295, 0.11998, 0.99615, 0.530107, 0.188887, 0.897213, 0.993726, 0.0799431, 0.953333, 0.941808, 0.982638, 0.0919345, 0.27504, 0.894169, 0.66818, 0.449537, 0.93063, 0.384957, 0.415114, 0.212203]
-
-julia> m1 = Dense(NUM_FEATURES, NUM_LABELS + 2)
-
-julia> loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1
+CRFs work smoothly with Flux layers:
 
-julia> loss1(input_seq,  [onehot(1, 1:2), onehot(1, 1:2)])
-4.6620379898687485
+```@example crf
+using Flux: Dense
+NUM_FEATURES = 20
 
+# For working with Dense layers, we can use 1D vectors
+input_seq_dense = [rand(NUM_FEATURES) for i in 1:SEQUENCE_LENGTH]
 ```
 
+```@example crf
+m1 = Dense(NUM_FEATURES, NUM_LABELS + 2)
+loss1(input_seq, label_seq) = crf_loss(c, m1.(input_seq), label_seq, init_α) # loss for model m1
+loss1(input_seq_dense,  [onehot(1, 1:2), onehot(1, 1:2), onehot(1, 1:2)])
+```
 
-Here is an example of CRF with LSTM and Dense layer -
-
-```julia
-julia> LSTM_SIZE = 10
-
-julia> lstm = LSTM(NUM_FEATURES, LSTM_SIZE)
 
-julia> dense_out = Dense(LSTM_SIZE, NUM_LABELS + 2)
+Here is an example of CRF with recurrent neural network layers:
 
-julia> m2(x) = dense_out.(lstm.(x))
+```@example crf
+# For recurrent layers, we need 2D input matrices (features × sequence_position)
+# Let's create properly formatted 2D data
+input_2d = [Float32.(rand(2, 1)) for i in 1:SEQUENCE_LENGTH]  # 2 features, 1 time step each
+input_2d
+```
 
-julia> loss2(input_seq, label_seq) = crf_loss(c, m2(input_seq), label_seq, init_α) # loss for model m2
+```@example crf
+using Flux: RNN
+# Create a simple RNN model that works with 2D input
+rnn_model = RNN(2 => 5)  # 2 input features → 5 hidden units
+dense_layer = Dense(5, NUM_LABELS + 2)  # 5 hidden → 4 output (NUM_LABELS + 2)
 
-julia> loss2(input_seq,  [onehot(1, 1:2), onehot(1, 1:2)])
-1.6501050910529504
+# Forward pass through RNN then Dense layer
+rnn_outputs = rnn_model.(input_2d)
+final_outputs = dense_layer.(rnn_outputs)
 
-julia> reset!(lstm)
+# Now we can use this with CRF
+loss_rnn(input_2d, label_seq) = crf_loss(c, dense_layer.(rnn_model.(input_2d)), label_seq, init_α)
+loss_rnn(input_2d, [onehot(1, 1:2), onehot(2, 1:2), onehot(1, 1:2)])
 ```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -2,7 +2,7 @@
 
 The TextModels package enhances the TextAnalysis package with end-user focussed, practical natural language models, typically based on neural networks (in this case, [Flux](https://fluxml.ai/))
 
-This package depends on the [TextAnalysis](https://github.com/JuliaText/TextAnalysis.jl) package, which contains basic algorithms to deal with textual documetns. 
+This package depends on the [TextAnalysis](https://github.com/JuliaText/TextAnalysis.jl) package, which contains basic algorithms to deal with textual documents. 
 
 ## Installation