Skip to content

asc-csa/ckanext-similar-datasets

 
 

Repository files navigation

ckanext-similar-datasets

ckanext-similar-datasets is a CKAN extension that adds a list of similar datasets to the dataset detail page.

Screenshot of the similar_datasets plugin

It was originally part of the larger ckanext-discovery plugin and can now be used as a standalone plugin.

German version: see README.DE.md.

System requirements

Tested with CKAN 2.9 & 2.10.

Other versions have not been tested. Feedback about compatibility with additional versions is welcome.

Installation

Activate the default CKAN virtual environment:

. /usr/lib/ckan/default/bin/activate

Install the plugin:

pip install -e git+https://github.com/ondics/ckanext-similar-datasets#egg=ckanext-similar-datasets

Or install a specific version:

pip install -e git+https://github.com/ondics/ckanext-similar-datasets@v0.1.1#egg=ckanext-similar-datasets

How it works

The plugin uses Solr's More Like This feature. Solr needs a small configuration change for this to work.

Add the MoreLikeThisHandler to /etc/solr/conf/solrconfig.xml.

Insert the block below right before the closing </config> tag near the end of the file:

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
    <lst name="defaults">
        <int name="mlt.mintf">3</int>
        <int name="mlt.mindf">1</int>
        <int name="mlt.minwl">3</int>
    </lst>
</requestHandler>

See the Solr documentation for more configuration details about the MoreLikeThis handler.

Next, enable term vector storage for the text field in /var/solr/data/ckan/conf/schema.xml.

Find this line:

<field name="text" type="text" indexed="true" stored="false" multiValued="true" />

Add termVectors="true" to it:

<field name="text" type="text" indexed="true" stored="false" multiValued="true" termVectors="true" />

Note: enabling term vectors significantly increases the size of the Solr index.

Then restart Solr:

sudo systemctl restart solr.service

Rebuild the search index so that term vectors for existing datasets are generated (new datasets will be added automatically):

. /usr/lib/ckan/default/bin/activate
ckan --config /etc/ckan/default/ckan.ini search-index rebuild

Finally, add similar_datasets to the list of active CKAN plugins in your configuration INI:

plugins = ... similar_datasets csa

and restart CKAN:

sudo service apache2 restart

Configuration

The plugin exposes one configuration option that can be added to your CKAN configuration file:

# The maximum number of similar datasets to show. Default is 5.
ckanext.similar_datasets.max_num = 4

Credits

Thanks to the original projects and authors:

License

Distributed under the GNU Affero General Public License.

See LICENSE for details.

Author

Copyright (C) 2021 Ondics GmbH
https://ondics.de

About

CKAN plugin to show similar datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.6%
  • HTML 27.4%