Skip to content

Setup Search Module

brad-wechter edited this page Sep 24, 2014 · 14 revisions

Installing the Lucene Search Module

Install the module with the package manager command: install-package BetterCms.Module.LuceneSearch.

The module will install with two workers, executed as asynchronous background processes that won't block the web application:

  • Index source watcher. This worker scans the Better CMS "pages" table and adds new pages to the indexing queue.
  • Indexing robot. This worker scans list of pages from the indexing queue and crawls specified URLS. At first, new pages are crawled, followed by failed pages and then already-crawled pages.

Use these parameters for configuring Lucene search module:

  • LuceneWebSiteUrl: web site URL (prefix, which will be added to scraping URLs)
  • LuceneFileSystemDirectory: Lucene files directory
  • LucenePagesWatcherFrequency: frequency timespan, how ofter worker should look for new pages created. Set to 00:00:00 for disabling new pages watcher.
  • LuceneIndexerPageFetchTimeout: page fetching timeout (how long system will wait for page to respond). Default value: 00:01:00 (1 minute)
  • LuceneIndexerFrequency: frequency timespan, how ofter content indexer should re-index pages content. Set to 00:00:00 for disabling indexer.
  • LuceneMaxPagesPerQuery: maximum number of re-indexed pages per query. Default value: 1000
  • LucenePageExpireTimeout: indexed page expire timeout.
  • LuceneDisableStopWords: disables using of stop words, such as ["a", "the", "of", ...], when indexing the content.
  • LuceneSearchForPartOfWords: if set to true, searches within words (similar to LIKE %query% in SQL)
  • LuceneIndexPrivatePages: if set to true, searches in private pages (authorization is required)
  • LuceneAuthorizationUrl: authorization URL (where user credentials are sent using POST method). Maybe the same URL as login form (for example, /login/).
  • LuceneAuthorizationForm: authorization form POST's parameters with values, e.g. LuceneAuthorizationForm.UserName, LuceneAuthorizationForm.Password, LuceneAuthorizationForm.CustomField

Example:

  <search>
    <add key="LuceneWebSiteUrl" value="http://bettercms.sandbox.mvc4.local/" />
    <add key="LuceneFileSystemDirectory" value="../../../Lucene.BetterCms" />
    <add key="LuceneIndexerFrequency" value="00:05:00" />
	<add key="LuceneIndexerPageFetchTimeout" value="00:01:00" />
    <add key="LucenePagesWatcherFrequency" value="00:05:00" />
    <add key="LuceneMaxPagesPerQuery" value="1000" />
    <add key="LucenePageExpireTimeout" value="00:00:00" />
    <add key="LuceneDisableStopWords" value="true" />
    <add key="LuceneSearchForPartOfWords" value="true" />
    <add key="LuceneIndexPrivatePages" value="true" />
    <add key="LuceneAuthorizationUrl" value="http://bettercms.sandbox.mvc4.local/login" />
    <add key="LuceneAuthorizationForm.UserName" value="admin" />
    <add key="LuceneAuthorizationForm.Password" value="admin" />
    <add key="LuceneAuthorizationForm.RememberMe" value="true" />
  </search>

Lucene module logging

There is an ability to log Lucene workers to another log file. Just use Lucene search module namespace LuceneSearchModule in the log configuration files.

There is an example, how all the information should be logged to file bettercms.log and Lucene search module's information - to the file bettercms.search.log:

<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <targets>
    [...]
    <target name="log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.log" archiveFileName="${basedir}/logs/error_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />    
    [...]
    <target name="search_log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.search.log" archiveFileName="${basedir}/logs/search_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />    
    [...]
  </targets>
  <rules>
    <logger name="LuceneSearchModule" writeTo="search_log_file" minlevel="Trace" final="true" />    
    [...]
    <logger name="*" writeTo="log_file" minlevel="Trace" maxlevel="Fatal" />
 </rules>
</nlog>

Installing Google search module

Install module with package manager command: install-package BetterCms.Module.GoogleSiteSearch.

For enabling Google Site search, user should have created Google Site Search account (can be registered here). It's paid service, prices are available here.

Google search is being done using such an URL query: https://www.googleapis.com/customsearch/v1?key={0}&cx={1} (read more here). These parameter can be set within cms.config file's search section:

  • GoogleSiteSearchApiKey: Your google API key (key in the URL).
  • GoogleSiteSearchEngineKey: Search engine's ID (cx in the URL).

Example:

  <search>
    <add key="GoogleSiteSearchApiKey" value="[BETTERCMS_GOOGLE_SEARCH_API_KEY]" />
    <add key="GoogleSiteSearchEngineKey" value="[BETTERCMS_GOOGLE_SEARCH_ENGINE_KEY]" />
  </search>

Using search module widgets

When BetterCms.Module.GoogleSiteSearch or BetterCms.Module.LuceneSearch module is installed, main search module BetterCms.Module.Search is installed also as referenced module. It creates two widgets within category Search: Search input form widget and Search results widget.

How to setup these widgets is discussed here.

Installing search module API

To use search module API method, module BetterCms.Module.Search.Api should be installed (with package manager command install-package BetterCms.Module.Search.Api)

Clone this wiki locally