ckanext-similar-datasets is a CKAN extension that adds a list of similar datasets to the dataset detail page.
It was originally part of the larger ckanext-discovery plugin and can now be used as a standalone plugin.
German version: see README.DE.md.
Tested with CKAN 2.9 & 2.10.
Other versions have not been tested. Feedback about compatibility with additional versions is welcome.
Activate the default CKAN virtual environment:
. /usr/lib/ckan/default/bin/activateInstall the plugin:
pip install -e git+https://github.com/ondics/ckanext-similar-datasets#egg=ckanext-similar-datasetsOr install a specific version:
pip install -e git+https://github.com/ondics/ckanext-similar-datasets@v0.1.1#egg=ckanext-similar-datasetsThe plugin uses Solr's More Like This feature. Solr needs a small configuration change for this to work.
Add the MoreLikeThisHandler to /etc/solr/conf/solrconfig.xml.
Insert the block below right before the closing </config> tag near the end of the file:
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<int name="mlt.mintf">3</int>
<int name="mlt.mindf">1</int>
<int name="mlt.minwl">3</int>
</lst>
</requestHandler>See the Solr documentation for more configuration details about the MoreLikeThis handler.
Next, enable term vector storage for the text field in /var/solr/data/ckan/conf/schema.xml.
Find this line:
<field name="text" type="text" indexed="true" stored="false" multiValued="true" />Add termVectors="true" to it:
<field name="text" type="text" indexed="true" stored="false" multiValued="true" termVectors="true" />Note: enabling term vectors significantly increases the size of the Solr index.
Then restart Solr:
sudo systemctl restart solr.serviceRebuild the search index so that term vectors for existing datasets are generated (new datasets will be added automatically):
. /usr/lib/ckan/default/bin/activate
ckan --config /etc/ckan/default/ckan.ini search-index rebuildFinally, add similar_datasets to the list of active CKAN plugins in your configuration INI:
plugins = ... similar_datasets csaand restart CKAN:
sudo service apache2 restartThe plugin exposes one configuration option that can be added to your CKAN configuration file:
# The maximum number of similar datasets to show. Default is 5.
ckanext.similar_datasets.max_num = 4Thanks to the original projects and authors:
Distributed under the GNU Affero General Public License.
See LICENSE for details.
Copyright (C) 2021 Ondics GmbH
https://ondics.de
