update

alstat · alstat · commit 3ffe066077c3 · 2025-11-20T17:28:41.000+08:00
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -25,15 +25,4 @@ julia> Pkg.add("Yunir")
   doi          = {10.5281/zenodo.6629868},
   url          = {https://doi.org/10.5281/zenodo.6629868}
 }
-```
-## Outline
-```@contents
-Pages = [
-    "man/basic_utilities.md",
-    "man/orthography.md",
-    "man/qurantree.md",
-    "man/api.md",
-    "man/references.md",
-]
-Depth = 2
 ```
diff --git a/docs/src/man/basic_utilities.md b/docs/src/man/basic_utilities.md
@@ -2,7 +2,7 @@ Basic Utilities
 =====
 In this section, we are going to discuss how to use the APIs for dediacritization, normalization, and transliteration.
 ## Dediacritization
-The function to use is `dediac` which works on either Arabic, Buckwalter or custom transliterated characters.
+Dediacritization is the process of removing diacritics from an Arabic word. These diacritics are mostly vowels but also includes _sukuun_ سُكُون  and _saddah_ شَدّة. The function to use for dediacritization is `dediac` which works on either Arabic, Buckwalter or custom transliterated characters.
 ```@repl abc
 using Yunir
 @transliterator :default
@@ -15,20 +15,23 @@ Or using Buckwalter as follows:
 bw_basmala = "bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi";
 dediac(bw_basmala; isarabic=false)
 ```
+The `isarabic` parameter with `false` argument indicates that the `dediac` function or `dediac` API takes a Buckwalter encoded input, `bw_basmala`, and returns an output that is not encoded in Arabic (as in the previous example) but instead an output in Buckwalter form as well. 
+
 With Julia's broadcasting feature, the above dediacritization can be applied to arrays by simply adding `.` to the name of the function.
 ```@repl abc
 sentence0 = ["بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ",
     "إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ"
 ]
 dediac.(sentence0)
 ```
+As seen above, broadcasting allows application of the `dediac` function to the elements of the vector `sentence0`. That is, because there are two entries in the `sentence0` vector, the broadcasting applies the `dediac` function to each of these and thus returning two outputs as well.
 ## Normalization
-The function to use is `normalize`, which works on either Arabic, Buckwalter or custom transliterated characters. For example, using the `ar_basmala` and `bw_basmala` defined above, the normalized version would be
+Arabic letters are calligraphic by design. It's free flowing design makes it very flexible to form unique ligatures that may require normalization for consistency's sake when doing natural language processing. To do normalization, the function to use is `normalize`, which works on either Arabic, Buckwalter or custom transliterated characters. For example, using the `ar_basmala` and `bw_basmala` defined above, the normalized version would be
 ```@repl abc
 normalize(ar_basmala)
 normalize(bw_basmala; isarabic=false)
 ```
-You can also normalize specific characters, for example:
+Again, the `isarabic=false` parameter simply disables an Arabic output and instead encode it as a Buckwalter output. You can also normalize specific characters, for example:
 ```@repl abc
 normalize(ar_basmala, :alif_khanjareeya)
 normalize(ar_basmala, :hamzat_wasl)
diff --git a/docs/src/man/orthography.md b/docs/src/man/orthography.md
@@ -14,7 +14,7 @@ If we want to take the numerals, we need to tokenize it first.
 ```@repl abc2
 arb_token = tokenize(ar_basmala)
 ```
-Next we then parse each of these words as   `Orthography`.
+Next, we parse each of these words as `Orthography`.
 ```@repl abc2
 arb_parsed1 = parse(Orthography, arb_token[1])
 arb_parsed2 = parse.(Orthography, arb_token)
@@ -41,7 +41,7 @@ vocals(arb_parsed2[3])
 ```
 
 ## Simple Encoding
-Simple encoding is a worded or spelled out transliteration of the arabic text.
+Simple encoding is a worded or spelled out transliteration of an Arabic text.
 ```@repl abc2
 parse(SimpleEncoding, ar_basmala)
 ```
diff --git a/docs/src/man/rhythmic_analysis.md b/docs/src/man/rhythmic_analysis.md
@@ -1,6 +1,6 @@
 Rhythmic Analysis
 =============
-The prevalence of poetry in Arabic literature necessitates scientific tool to study the rhythmic signatures. Unfortunately, there are no resources for such methodology until the recent work of [asaadthesis](@citet). This section will demonstrate the APIs for doing rhythmic analysis based on the methodologies proposed by [asaadthesis](@citet). To do this, there are two types of text that will be studied, and these are pre-Islamic poetry and the Holy Qur'an.
+The prevalence of poetry in Arabic literature necessitates scientific tool to study the rhythmic signatures. Unfortunately, apart from the fact that there are no tools to do this yet, at least to the best knowledge of the author, there are no resources as well for the statistical methodologies of studying rhythm as well until recently. The recent work of [asaadthesis](@citet) provided initial statistical tools that are now available in Yunir.jl as well. This section will demonstrate the APIs for doing rhythmic analysis based on the methodologies proposed by [asaadthesis](@citet). To do this, there are two types of text that will be studied, one for Arabic poetry and the Holy Qur'an. For the Holy Qur'an, a comprehensive analysis was done by [asaadthesis](@citet), readers are encouraged to read it. As to how to do apply it though using Yunir.jl, this section will cover the details.
 
 ## Arabic Poetry
 The first data is from a well known author, [Al-Mutanabbi المتنبّي](https://en.wikipedia.org/wiki/Al-Mutanabbi), who authored several poetry including the titled [*'Indeed, every woman with a swaying walk'*](https://www.youtube.com/watch?v=9c1IrQwfYFM), which will be the basis for this section.
diff --git a/docs/src/man/text_alignment.md b/docs/src/man/text_alignment.md
@@ -41,7 +41,7 @@ shamela0012129_cln = clean(shamela0012129)
 shamela0023790_cln = clean(shamela0023790)
 ```
 !!! tips "Tips"
-    The `clean` function removes the non-Arabic characters through RegEx or [Regular Expression](https://en.wikipedia.org/wiki/Regular_expression), which is set at the third argument of the function. That is, `clean(shamela0012129)` is actually equivalent to:
+    The `clean` function removes the non-Arabic characters through RegEx or [Regular Expression](https://en.wikipedia.org/wiki/Regular_expression), which is set at the third parameter of the function. That is, `clean(shamela0012129)` is actually equivalent to:
     ```julia
     clean(shamela0012129; replace_non_ar="", target_regex=r"[A-Za-z0-9\(:×\|\–\[\«\»\]~\)_@./#&+\—-]*")
     ```
@@ -92,7 +92,7 @@ We can actually extract the encoded version, which is in extended Buckwalter tra
 ```@repl abc
 res1.alignment
 ```
-This is the same with the result above, but this one is the Buckwalter encoded Arabic input.
+This is the same with the previous result above, but this one is the Buckwalter encoded Arabic input.
 
 The number in the left side is the index of the first character in the row, whereas the number in the right side is the index of the last character in the row.
 ### Alignment statistics
@@ -192,7 +192,7 @@ f
 ```
 The figure above is divided into three subplots arranged in rows. You can think of the figure as two input text displayed in horizontal (i.e, sideways) orientation. In this orientation, the x-axis becomes the rows of the texts, that is, you can think of the x-axis as the rows of the texts in the book. In this case, we have two books, the reference and the target books. Each dot in reference and target corresponds to the characters that have matched. The lines and curves in the middle (colored in red) represent the connections of the rows of the texts where the matched happened. Further, the y-axis correspond to the length of the rows, in this case 60 characters per row. As you can see, the top tick label of the y-axis is 0 and the bottom tick label of the y-axis is 60, this is because the writing of Arabic is right-to-left, and so we can think of the 0th-tick at the top as the starting index of the first character in both texts, and the row ends at the 60th-tick at the bottom.
 
-We added further customization to the plot, readers are encouraged to explore the API.
+We added further customization to the plot, readers are encouraged to explore the [API](http://127.0.0.1:5501/docs/build/man/api/).
 
 As for the plot of insertions of characters, we have:
 ```@example abc
@@ -228,7 +228,7 @@ a[3].xticks = 0:2:unique(xys[2][1])[end]
 f
 ```
 ## Cost Model
-The pairwise alignment above works by minimizing a cost function, which is define by a cost model. It is important that we understand how the cost model is setup so that we can give proper scoring for the mismatches, matches, deletions and insertions. To define a cost model, we use [BioAligments.jl](https://github.com/BioJulia/BioAlignments.jl)'s `CostModel` struct.
+The pairwise alignment above works by minimizing a cost function, which is defined by a cost model. It is important that we understand how the cost model is setup so that we can give proper scoring for the mismatches, matches, deletions and insertions. To define a cost model, we use [BioAligments.jl](https://github.com/BioJulia/BioAlignments.jl)'s `CostModel` struct.
 
 The default cost model is given by
 ```@setup def