Skip to content

Commit 91f21ae

Browse files
Merge pull request #266293 from PatrickFarley/cogserv
replace links w archive
2 parents d4731fb + 84aff5a commit 91f21ae

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/search/index-add-custom-analyzers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ In the table below, the token filters that are implemented using Apache Lucene a
294294
|[shingle](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html)|ShingleTokenFilter|Creates combinations of tokens as a single token.<br><br> **Options**<br><br> maxShingleSize (type: int) - Defaults to 2.<br><br> minShingleSize (type: int) - Defaults to 2.<br><br> outputUnigrams (type: bool) - if true, the output stream contains the input tokens (unigrams) as well as shingles. The default is true.<br><br> outputUnigramsIfNoShingles (type: bool) - If true, override the behavior of outputUnigrams==false for those times when no shingles are available. The default is false.<br><br> tokenSeparator (type: string) - The string to use when joining adjacent tokens to form a shingle. The default is a single empty space ` `. <br><br> filterToken (type: string) - The string to insert for each position for which there is no token. The default is `_`.|
295295
|[snowball](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html)|SnowballTokenFilter|Snowball Token Filter.<br><br> **Options**<br><br> language (type: string) - Allowed values include: `armenian`, `basque`, `catalan`, `danish`, `dutch`, `english`, `finnish`, `french`, `german`, `german2`, `hungarian`, `italian`, `kp`, `lovins`, `norwegian`, `porter`, `portuguese`, `romanian`, `russian`, `spanish`, `swedish`, `turkish`|
296296
|[sorani_normalization](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html)|SoraniNormalizationTokenFilter|Normalizes the Unicode representation of `Sorani` text.<br><br> **Options**<br><br> None.|
297-
|stemmer|StemmerTokenFilter|Language-specific stemming filter.<br><br> **Options**<br><br> language (type: string) - Allowed values include: <br> - [`arabic`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ar/ArabicStemmer.html)<br>- [`armenian`](https://snowballstem.org/algorithms/armenian/stemmer.html)<br>- [`basque`](https://snowballstem.org/algorithms/basque/stemmer.html)<br>- [`brazilian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/br/BrazilianStemmer.html)<br>- `bulgarian`<br>- [`catalan`](https://snowballstem.org/algorithms/catalan/stemmer.html)<br>- [`czech`](https://portal.acm.org/citation.cfm?id=1598600)<br>- [`danish`](https://snowballstem.org/algorithms/danish/stemmer.html)<br>- [`dutch`](https://snowballstem.org/algorithms/dutch/stemmer.html)<br>- [`dutchKp`](https://snowballstem.org/algorithms/kraaij_pohlmann/stemmer.html)<br>- [`english`](https://snowballstem.org/algorithms/porter/stemmer.html)<br>- [`lightEnglish`](https://ciir.cs.umass.edu/pubfiles/ir-35.pdf)<br>- [`minimalEnglish`](https://www.researchgate.net/publication/220433848_How_effective_is_suffixing)<br>- [`possessiveEnglish`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilter.html)<br>- [`porter2`](https://snowballstem.org/algorithms/english/stemmer.html)<br>- [`lovins`](https://snowballstem.org/algorithms/lovins/stemmer.html)<br>- [`finnish`](https://snowballstem.org/algorithms/finnish/stemmer.html)<br>- `lightFinnish`<br>- [`french`](https://snowballstem.org/algorithms/french/stemmer.html)<br>- [`lightFrench`](https://dl.acm.org/citation.cfm?id=1141523)<br>- [`minimalFrench`](https://dl.acm.org/citation.cfm?id=318984)<br>- `galician`<br>- `minimalGalician`<br>- [`german`](https://snowballstem.org/algorithms/german/stemmer.html)<br>- [`german2`](https://snowballstem.org/algorithms/german2/stemmer.html)<br>- [`lightGerman`](https://dl.acm.org/citation.cfm?id=1141523)<br>- `minimalGerman`<br>- [`greek`](https://sais.se/mthprize/2007/ntais2007.pdf)<br>- `hindi`<br>- [`hungarian`](https://snowballstem.org/algorithms/hungarian/stemmer.html)<br>- [`lightHungarian`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)<br>- [`indonesian`](https://eprints.illc.uva.nl/741/2/MoL-2003-03.text.pdf)<br>- [`irish`](https://snowballstem.org/algorithms/irish/stemmer.html)<br>- [`italian`](https://snowballstem.org/algorithms/italian/stemmer.html)<br>- [`lightItalian`](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)<br>- [`sorani`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ckb/SoraniStemmer.html)<br>- [`latvian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemmer.html)<br>- [`norwegian`](https://snowballstem.org/algorithms/norwegian/stemmer.html)<br>- [`lightNorwegian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html)<br>- [`minimalNorwegian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html)<br>- [`lightNynorsk`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html)<br>- [`minimalNynorsk`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html)<br>- [`portuguese`](https://snowballstem.org/algorithms/portuguese/stemmer.html)<br>- [`lightPortuguese`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)<br>- [`minimalPortuguese`](https://www.inf.ufrgs.br/~buriol/papers/Orengo_CLEF07.pdf)<br>- [`portugueseRslp`](https://www.inf.ufrgs.br/~viviane/rslp/index.htm)<br>- [`romanian`](https://snowballstem.org/otherapps/romanian/)<br>- [`russian`](https://snowballstem.org/algorithms/russian/stemmer.html)<br>- [`lightRussian`](https://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf)<br>- [`spanish`](https://snowballstem.org/algorithms/spanish/stemmer.html)<br>- [`lightSpanish`](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)<br>- [`swedish`](https://snowballstem.org/algorithms/swedish/stemmer.html)<br>- `lightSwedish`<br>- [`turkish`](https://snowballstem.org/algorithms/turkish/stemmer.html)|
297+
|stemmer|StemmerTokenFilter|Language-specific stemming filter.<br><br> **Options**<br><br> language (type: string) - Allowed values include: <br> - [`arabic`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ar/ArabicStemmer.html)<br>- [`armenian`](https://snowballstem.org/algorithms/armenian/stemmer.html)<br>- [`basque`](https://snowballstem.org/algorithms/basque/stemmer.html)<br>- [`brazilian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/br/BrazilianStemmer.html)<br>- `bulgarian`<br>- [`catalan`](https://snowballstem.org/algorithms/catalan/stemmer.html)<br>- [`czech`](https://portal.acm.org/citation.cfm?id=1598600)<br>- [`danish`](https://snowballstem.org/algorithms/danish/stemmer.html)<br>- [`dutch`](https://snowballstem.org/algorithms/dutch/stemmer.html)<br>- [`dutchKp`](https://snowballstem.org/algorithms/kraaij_pohlmann/stemmer.html)<br>- [`english`](https://snowballstem.org/algorithms/porter/stemmer.html)<br>- [`lightEnglish`](https://ciir.cs.umass.edu/pubfiles/ir-35.pdf)<br>- [`minimalEnglish`](https://www.researchgate.net/publication/220433848_How_effective_is_suffixing)<br>- [`possessiveEnglish`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilter.html)<br>- [`porter2`](https://snowballstem.org/algorithms/english/stemmer.html)<br>- [`lovins`](https://snowballstem.org/algorithms/lovins/stemmer.html)<br>- [`finnish`](https://snowballstem.org/algorithms/finnish/stemmer.html)<br>- `lightFinnish`<br>- [`french`](https://snowballstem.org/algorithms/french/stemmer.html)<br>- [`lightFrench`](https://dl.acm.org/citation.cfm?id=1141523)<br>- [`minimalFrench`](https://dl.acm.org/citation.cfm?id=318984)<br>- `galician`<br>- `minimalGalician`<br>- [`german`](https://snowballstem.org/algorithms/german/stemmer.html)<br>- [`german2`](https://snowballstem.org/algorithms/german2/stemmer.html)<br>- [`lightGerman`](https://dl.acm.org/citation.cfm?id=1141523)<br>- `minimalGerman`<br>- [`greek`](https://sais.se/mthprize/2007/ntais2007.pdf)<br>- `hindi`<br>- [`hungarian`](https://snowballstem.org/algorithms/hungarian/stemmer.html)<br>- [`lightHungarian`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)<br>- [`indonesian`](https://eprints.illc.uva.nl/741/2/MoL-2003-03.text.pdf)<br>- [`irish`](https://snowballstem.org/algorithms/irish/stemmer.html)<br>- [`italian`](https://snowballstem.org/algorithms/italian/stemmer.html)<br>- [`lightItalian`](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)<br>- [`sorani`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ckb/SoraniStemmer.html)<br>- [`latvian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemmer.html)<br>- [`norwegian`](https://snowballstem.org/algorithms/norwegian/stemmer.html)<br>- [`lightNorwegian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html)<br>- [`minimalNorwegian`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html)<br>- [`lightNynorsk`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html)<br>- [`minimalNynorsk`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html)<br>- [`portuguese`](https://snowballstem.org/algorithms/portuguese/stemmer.html)<br>- [`lightPortuguese`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)<br>- [`minimalPortuguese`](https://web.archive.org/web/20230425141918/https://www.inf.ufrgs.br/~buriol/papers/Orengo_CLEF07.pdf)<br>- [`portugueseRslp`](https://web.archive.org/web/20230422082818/https://www.inf.ufrgs.br/~viviane/rslp/index.htm)<br>- [`romanian`](https://snowballstem.org/otherapps/romanian/)<br>- [`russian`](https://snowballstem.org/algorithms/russian/stemmer.html)<br>- [`lightRussian`](https://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf)<br>- [`spanish`](https://snowballstem.org/algorithms/spanish/stemmer.html)<br>- [`lightSpanish`](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)<br>- [`swedish`](https://snowballstem.org/algorithms/swedish/stemmer.html)<br>- `lightSwedish`<br>- [`turkish`](https://snowballstem.org/algorithms/turkish/stemmer.html)|
298298
|[stemmer_override](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html)|StemmerOverrideTokenFilter|Any dictionary-Stemmed terms are marked as keywords, which prevents stemming down the chain. Must be placed before any stemming filters.<br><br> **Options**<br><br> rules (type: string array) - Stemming rules in the following format `word => stem` for example `ran => run`. The default is an empty list. Required.|
299299
|[stopwords](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html)|StopwordsTokenFilter|Removes stop words from a token stream. By default, the filter uses a predefined stop word list for English.<br><br> **Options**<br><br> stopwords (type: string array) - A list of stopwords. Can't be specified if a stopwordsList is specified.<br><br> stopwordsList (type: string) - A predefined list of stopwords. Can't be specified if `stopwords` is specified. Allowed values include:`arabic`, `armenian`, `basque`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `thai`, `turkish`, default: `english`. Can't be specified if `stopwords` is specified. <br><br> ignoreCase (type: bool) - If true, all words are lower cased first. The default is false.<br><br> removeTrailing (type: bool) - If true, ignore the last search term if it's a stop word. The default is true.
300300
|[synonym](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html)|SynonymTokenFilter|Matches single or multi word synonyms in a token stream.<br><br> **Options**<br><br> synonyms (type: string array) - Required. List of synonyms in one of the following two formats:<br><br> -incredible, unbelievable, fabulous => amazing - all terms on the left side of => symbol are replaced with all terms on its right side.<br><br> -incredible, unbelievable, fabulous, amazing - A comma-separated list of equivalent words. Set the expand option to change how this list is interpreted.<br><br> ignoreCase (type: bool) - Case-folds input for matching. The default is false.<br><br> expand (type: bool) - If true, all words in the list of synonyms (if => notation is not used) map to one another. <br>The following list: incredible, unbelievable, fabulous, amazing is equivalent to: incredible, unbelievable, fabulous, amazing => incredible, unbelievable, fabulous, amazing<br><br>- If false, the following list: incredible, unbelievable, fabulous, amazing are equivalent to: incredible, unbelievable, fabulous, amazing => incredible.|

0 commit comments

Comments
 (0)