-
Notifications
You must be signed in to change notification settings - Fork 14
Lucene Languagefallback
This feature is available from 1.17.
In some portals you might want the search to only show results in one language and preferably the language the user is currently browsing the portal in. This can be tricky when you have several language versions for the pages in your portal that might not all exist. E.g. page A exists in english and german and page B only exists in hungarian. If the user now searches in german, you might want to show the german version of page A and the hungarian version of page B because there is no other version of page B.
The search uses the LanguageFallbackSortingTopDocsCollector to collect the search result and if several language versions of a page are in the result, the collector will pick the best fitting one. The language grouping will be done on a configurable attribute (e.g. languagesetid) and the language will also be read from a configurable attribute (e.g. language).
This means that in the index every language version of a page should have the same languagesetid. To take our example from above: Page a in its english version will have 253 as languagesetid as well as the german version. Both have different ids and different languages (e.g. de and en).
The collector uses field caches to access the fields needed and will keep a map of the found objects containing the best fitting language version. This requires a little more memory than the default collector.
To determine the best fitting language, the collector will use a configurable languagepriority array. The smaller the index of the found language in this array, the higher the priority. This languagepriority array can also be passed in the request, using the key "languagefallbackpriority".
As described above you will have two more required fields in the index, that contain the language grouping attribute (e.g. languagesetid) and the language.
To enable this language fallback search, you simple have to configure it in your search.properties file:
rp.1.collectorClass=com.gentics.cr.lucene.search.collector.LanguageFallbackSortingTopDocsCollector
rp.1.collector.languagefallbackpriority=de,en
rp.1.collector.languagefield=languagecode
rp.1.collector.languagesetfield=languagesetid
If the LanguageFallbackSortingTopDocsCollector is used to search an index, every document in the index must have set the attribute configured under the "languagesetfield"-key to allow the proper grouping of results. If you want to mix language aware content (e.g. CMS pages) with content without languages (e.g. CMS files) in one index/searcher you have to make sure that for content without language a unique languagesetid is set.
The easiest way to ensure that cms files have a unique languagesetid in the index is to use the copy value transformer during indexing to copy the contentid-attribute to the languagesetid-attribute:
index.DEFAULT.CR.FILES.transformer.8.transformerclass=com.gentics.cr.lucene.indexer.transformer.other.CopyValue
index.DEFAULT.CR.FILES.transformer.8.sourceattribute=contentid
index.DEFAULT.CR.FILES.transformer.8.targetattribute=languagesetid