Skip to content

Commit 148bf8a

Browse files
committed
(feature) multi-value hierarchical facets documentation; required for eXist-db/exist#3182
1 parent d563cd7 commit 148bf8a

File tree

4 files changed

+48
-3
lines changed

4 files changed

+48
-3
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<collection xmlns="http://exist-db.org/collection-config/1.0">
3+
<index xmlns:db="http://docbook.org/ns/docbook" xmlns:xs="http://www.w3.org/2001/XMLSchema">
4+
<lucene>
5+
<module uri="http://exist-db.org/lucene/test/" prefix="idx" at="module.xql"/>
6+
<text qname="db:article">
7+
<facet dimension="keyword" expression="db:info/db:keywordset/db:keyword"/>
8+
<facet dimension="date" expression="tokenize(db:info/db:pubdate, '-')" hierarchical="yes"/>
9+
<facet dimension="subject" expression="idx:subject-hierarchy(db:info/db:subjectset/db:subject/db:subjectterm)" hierarchical="yes"/>
10+
</text>
11+
</lucene>
12+
</index>
13+
</collection>
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
declare function idx:subject-hierarchy($key as xs:string*) {
2+
array:for-each (array {$key}, function($k) {
3+
doc('/db/subjects/subjects.xml')//subject[@name=$k]/ancestor-or-self::subject/@name
4+
})
5+
};
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
<subject>
2+
<subject name="science">
3+
<subject name="math"/>
4+
<subject name="physics"/>
5+
</subject>
6+
<subject name="humanities">
7+
<subject name="art"/>
8+
<subject name="sociology"/>
9+
<subject name="history"/>
10+
</subject>
11+
</subject>

src/main/xar-resources/data/lucene/lucene.xml

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -394,14 +394,30 @@
394394
expression rooted in the parent node being indexed. In the example the parent will be a <tag>db:article</tag> element, so the context item
395395
for the expression is set to this element.</para>
396396
<para>The expression is evaluated and for each result item, a facet value is added
397-
to the dimension using the string value of the item. If the expression returns
398-
the empty sequence for the current parent node, the corresponding facet will be
399-
empty as well.</para>
397+
to the dimension using the string value of the item. Therefore if the expression
398+
returns multiple items, a facet for that dimension will also hold multiple values.
399+
If the expression returns the empty sequence for the current parent node, the corresponding
400+
facet will be empty as well.</para>
400401
<para>A facet can also be defined to be hierarchical. A typical example would be a date, which consists of a year, month and day component. By
401402
indexing the single components as separate parts of a hierarchical facet, we enable the user to drill down by year first, then by month and
402403
finally by day. Let's assume each of our docbook articles has a <tag>pubdate</tag> containing a date in <code>xs:date</code>
403404
format:</para>
404405
<programlisting language="xml" xlink:href="listings/listing-52.txt"></programlisting>
406+
<para>Hierarchical facets may also hold multiple values, for example if we would like to associate
407+
our documents with a subject classification on various levels of granularity (say: <emphasis>science</emphasis> with
408+
<emphasis>math</emphasis> and <emphasis>physics</emphasis> as subcategories or <emphasis>humanities</emphasis> with
409+
<emphasis>art</emphasis>, <emphasis>sociology</emphasis> and <emphasis>history</emphasis>).
410+
This way we enable the user to drill down into broad <emphasis>humanities</emphasis>
411+
or <emphasis>science</emphasis> subject first and choose particular topics afterwards.
412+
If the result of the hierarchical facet <code>expression</code>
413+
evaluates to an array, each of array members will be treated as a hierarchical value for that facet.
414+
Such an array could look in XQuery similar to <code>[('science', 'math'), ('humanities', 'history')]</code> and be
415+
a result of evaluationg a function like <code>idx:subject-hierarchy</code> below stored in an imported module (see <link linkend="external-module">below</link>)
416+
</para>
417+
<programlisting language="xml" xlink:href="listings/listing-520.xml"/>
418+
<programlisting language="xquery" xlink:href="listings/listing-521.txt"/>
419+
<para>which assumes hierarchical subject structure stored in <emphasis>/db/lucenetest/subjects.xml</emphasis></para>
420+
<programlisting language="xml" xlink:href="listings/listing-522.xml"/>
405421
<para>Next, we may want to define fields for the authors and title of the article. In docbook, <tag>author</tag> can be a complex element,
406422
consisting e.g. of a <tag>personname</tag> with nested
407423
<tag>surname</tag> and <tag>firstname</tag>. For display to the user and sorting we want to pre-compute a normalized string out of those

0 commit comments

Comments
 (0)