Skip to content

Commit b11ce2b

Browse files
committed
Tweak documentation and add section on Languages
1 parent a31d7d1 commit b11ce2b

File tree

1 file changed

+21
-3
lines changed

1 file changed

+21
-3
lines changed

docs/index.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Textmogrify is a pre-alpha text manipulation library that hopefully works well w
44

55
## Usage
66

7-
This library is currently available for Scala binary versions 2.13 and 3.1.
7+
This library is currently available for Scala binary versions 2.13 and 3.2.
88

99
To use the latest version, include the following in your `build.sbt`:
1010

@@ -26,9 +26,10 @@ libraryDependencies ++= Seq(
2626

2727
The Lucene module lets you use a Lucene [Analyzer][analyzer] to modify text, additionally it provides helpers to use `Analyzer`s with an fs2 [Stream][stream].
2828

29+
2930
### Basics
3031

31-
Typical usage is to use the `AnalyzerBuilder` to configure an `Analyzer` and call `.tokenizer` to get a `Resource[F, String => F[Vector[String]]]`:
32+
Typical usage is to use the `AnalyzerBuilder` to configure an `Analyzer` and call `.tokenizer[F]` to get a `Resource[F, String => F[Vector[String]]]`:
3233

3334
```scala mdoc:silent
3435
import textmogrify.lucene.AnalyzerBuilder
@@ -52,9 +53,26 @@ tokens.unsafeRunSync()
5253
We can see that our text was lowercased and the unicode `ñ` replaced with an ASCII `n`.
5354

5455

56+
### Languages
57+
58+
Textmogrify comes with support for multiple languages.
59+
When setting up an `AnalyzerBuilder` you'll have access to language specific options once you call one of the helper language methods like `english` or `french`.
60+
Specifying a language preserves the configuration set beforehand.
61+
62+
```scala mdoc:silent
63+
val base = AnalyzerBuilder.default.withLowerCasing.withASCIIFolding
64+
65+
val en = base.english.withPorterStemmer.tokenizer[IO]
66+
val fr = base.french.withFrenchLightStemmer.tokenizer[IO]
67+
val es = base.spanish.withSpanishLightStemmer.tokenizer[IO]
68+
```
69+
70+
All of `en`, `fr`, and `es` will both lowercase and asciifold their inputs in addition to using their language specific stemmers.
71+
72+
5573
### Pipelines
5674

57-
Another common use is to construct a `Pipe`, or `Stream` to `Stream` function.
75+
Another common use is to construct a `Pipe`, or `Stream` to `Stream` function using an `Analyzer`.
5876
Let's say we have some messages we want to analyze and index as part of some search component.
5977
Given a raw `Msg` type and an analyzed `Doc` type, we want to transform a `Stream[F, Msg]` into a `Stream[F, Doc]`.
6078

0 commit comments

Comments
 (0)