Skip to content

Commit fa0503f

Browse files
committed
fix(lucene): add info about diacritics analyzer
close #83
1 parent 0a4689c commit fa0503f

File tree

3 files changed

+26
-1
lines changed

3 files changed

+26
-1
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
<collection xmlns="http://exist-db.org/collection-config/1.0">
2+
<index xmlns:xs="http://www.w3.org/2001/XMLSchema">
3+
<lucene>
4+
<analyzer class="org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer" id="nodiacritics"/>
5+
<text qname="letter" analyzer="nodiacritics">
6+
<field name="place" expression="place" analyzer="nodiacritics"/>
7+
<field name="from" expression="from" store="no"/>
8+
<field name="to" expression="to"/>
9+
</text>
10+
</lucene>
11+
</index>
12+
</collection>
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
<collection xmlns="\"http://exist-db.org/collection-config/1.0\"">
2+
<index>
3+
<!-- Lucene indexes -->
4+
<lucene diacritics='no'>
5+
<analyzer class='org.apache.lucene.analysis.standard.StandardAnalyzer'/>
6+
<text match="\"//title[@xml:lang='Sa-Ltn']\""/>\
7+
<text match="\"/TEI/text\""><ignore qname="\"text\""/></text>
8+
</lucene>
9+
</index>
10+
</collection>

src/main/xar-resources/data/lucene/lucene.xml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,10 @@
288288
the text at white space characters, but treats all other characters - including
289289
punctuation - as part of the token. The tokens are not converted to lower case and
290290
there's no stopword filter applied.</para>
291-
291+
<para>eXist-db provides a special analyzer for characters with diacrits based on the <literal>StandardAnalyzer</literal>. The <literal>NoDiacriticsStandardAnalyzer</literal> can be switched on and of by setting the <literal>diacritics</literal> attribute on the <tag>lucene</tag> of your index configuration file.</para>
292+
<programlisting language="xml" xlink:href="listings/listing-24.xml"/>
293+
<para>Without diacrits <literal>ä</literal>, <literal>å</literal>, <literal>ā</literal>, etc are will all be indexed as <literal>a</literal>. Alternatively, this analyzer can also be called by its full name. </para>
294+
<programlisting language="xml" xlink:href="listings/listing-23.xml"/>
292295
<sect3 xml:id="conf">
293296
<title>Configuring the Analyzer</title>
294297

0 commit comments

Comments
 (0)