-
Notifications
You must be signed in to change notification settings - Fork 207
Tutorial: Nutch
The following information was contributed by Praful Bagai.
This tutorial assumes that you are customizing the Reuters tutorial.
In reuters.js, Update the Solr parameters in var params to reflect the structure of your Solr documents:
- Update
facet.fieldwith the fields on which you want to facet - Remove
f.topics.facet.limitandf.countryCodes.facet.limitunless your Solr documents havetopicsorcountryCodesfields - Remove all
facet.dateparameters unless your Solr documents have a date field on which you want to facet
Either update or remove the tag cloud, autocomplete, country code and calendar widgets. For the tag cloud, you can set the associated Solr fields by changing the value of var fields.
Nutch uses a content field, instead of a text field like in the Reuters demo. In reuters.theme.js, in the AjaxSolr.theme.prototype.snippet function, replace doc.text with doc.content. Nutch has no dateline field, so remove doc.dateline + ' ' + .
Check the following properties in your nutch-default.xml:
<property>
<name>fetcher.store.content</name>
<value>true</value>
<description>If true, fetcher will store content.</description>
</property><property>
<name>parser.caching.forbidden.policy</name>
<value>content</value>
<description>If a site (or a page) requests through its robot metatags
that it should not be shown as cached content, apply this policy.
Currently
three keywords are recognized: "none" ignores any "noarchive" directives.
"content" doesn't show the content, but shows summaries (snippets).
"all" doesn't show either content or summaries.</description>
</property>You may also need to copy fields from your Nutch schema to your Solr schema.