Skip to content

Commit 83141bb

Browse files
authored
Add info about the Spanish Plural Stemmer (#3040)
1 parent 0937bb5 commit 83141bb

File tree

1 file changed

+39
-2
lines changed

1 file changed

+39
-2
lines changed

solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3223,14 +3223,15 @@ With class name (legacy)::
32233223

32243224
=== Spanish
32253225

3226-
Solr includes two stemmers for Spanish: one in the `solr.SnowballPorterFilterFactory language="Spanish"`, and a lighter stemmer called `solr.SpanishLightStemFilterFactory`.
3226+
Solr includes three stemmers for Spanish: the `solr.SnowballPorterFilterFactory language="Spanish"`, a lighter stemmer called `solr.SpanishLightStemFilterFactory` and a plural stemmer called `solr.SpanishPluralStemFilter` (https://mices.co/mices2021/slides/Xavier-Sanchez_Spanish-Stemmers-Solr.pdf[slides], https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373[article]) that implements the rules described in http://www.wikilengua.org/index.php/Plural_(formación) and can be useful in conjunction with synonyms as it produces meaningful tokens in the singular form (e.g. `amigo`, not `amig`).
3227+
32273228
Lucene includes an example stopword list.
32283229

32293230
*Factory class:* `solr.SpanishStemFilterFactory`
32303231

32313232
*Arguments:* None
32323233

3233-
*Example:*
3234+
*Example 1:*
32343235

32353236
[tabs#lang-spanish]
32363237
======
@@ -3267,6 +3268,42 @@ With class name (legacy)::
32673268

32683269
*Out:* "tor", "tor", "tor"
32693270

3271+
*Example 2:*
3272+
3273+
[tabs#lang-spanish]
3274+
======
3275+
With name::
3276+
+
3277+
====
3278+
[source,xml]
3279+
----
3280+
<analyzer>
3281+
<tokenizer name="standard"/>
3282+
<filter name="lowercase"/>
3283+
<filter name="spanishPluralStem"/>
3284+
</analyzer>
3285+
----
3286+
====
3287+
3288+
With class name (legacy)::
3289+
+
3290+
====
3291+
[source,xml]
3292+
----
3293+
<analyzer>
3294+
<tokenizer class="solr.StandardTokenizerFactory"/>
3295+
<filter class="solr.LowerCaseFilterFactory"/>
3296+
<filter class="solr.SpanishPluralStemFilterFactory"/>
3297+
</analyzer>
3298+
----
3299+
====
3300+
======
3301+
3302+
*In:* "ases esprais paces bits amigos cantar caries"
3303+
3304+
*Tokenizer to Filter:* "ases", "esprais", "paces", "bits", "amigos", "cantar", "caries"
3305+
3306+
*Out:* "as", "espray", "paz", "bit", "amigo", "cantar", "caries"
32703307

32713308
=== Swedish
32723309

0 commit comments

Comments
 (0)