Skip to content

Commit 522fd2f

Browse files
committed
Merge branch 'develop' into 11562-api-get-templates
2 parents 970e954 + 7e16ed3 commit 522fd2f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+1378
-227
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Development of Dataverse on Windows has been confirmed to work as long as you use WSL rather than cmd.exe. See [the guides](https://dataverse-guide--11583.org.readthedocs.build/en/11583/developers/dev-environment.html#quickstart), #10606, and #11583.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
### Configurable Search Services
2+
3+
Dataverse now has an experimental capability to dynamically add and configure new search engines.
4+
The current Dataverse user interface can be configured to use a specified search engine instead of the built-in solr search.
5+
The search API now supports an optional `searchService` query parameter that allows using any configured search engine.
6+
An additional /api/search/services endpoint allows discovery of the services installed.
7+
8+
In addition to two trivial example services designed for testing, Dataverse ships with two search engine classes that support calling an externally-hosted search service (via HTTP GET or POST).
9+
These classes rely on the internal solr search to perform access-control and to format the final results, simplifying development of such an external engine.
10+
11+
Details about the new functionality are described in https://dataverse-guide--11281.org.readthedocs.build/en/11281/developers/search-services.html
12+
13+
See also #11281.
14+
15+
## Settings
16+
17+
### Database Settings:
18+
19+
***New:***
20+
21+
- :GetExternalSearchUrl
22+
- :GetExternalSearchName
23+
- :PostExternalSearchUrl
24+
- :PostExternalSearchName
25+
26+
### New Configuration Options
27+
28+
- `dataverse.search.services.directory`
29+
- `dataverse.search.default-service`

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2751,9 +2751,9 @@ The limit can be set via JVM setting :ref:`dataverse.files.default-dataset-file-
27512751

27522752
For Installation wide limit, the limit can be set via JVM. ./asadmin $ASADMIN_OPTS create-jvm-options "-Ddataverse.files.default-dataset-file-count-limit=<limit>"
27532753

2754-
For Collections, the attribute can be controlled by calling the Create or Update Dataverse API and adding ``datasetFileCountLimit=500`` to the Json body.
2754+
For Collections, the attribute can be controlled by calling the Create or Update Dataverse API and adding ``datasetFileCountLimit=500`` to the Json body (Must be a superuser to change this value).
27552755

2756-
For Datasets, the attribute can be set using the `Update Dataset Files Limit <#setting-the-files-count-limit-on-a-dataset>`_ API and passing the qp `fileCountLimit=500`.
2756+
For Datasets, the attribute can be set using the `Update Dataset Files Limit <#setting-the-files-count-limit-on-a-dataset>`_ API and passing the qp `fileCountLimit=500` (Must be a superuser to change this value).
27572757

27582758
Setting a value of -1 will clear the limit for that level. If no limit is found on the Dataset, the hierarchy of parent nodes will be checked until finally the JVM setting is checked.
27592759

@@ -2769,7 +2769,7 @@ Setting the files count limit on a Dataset
27692769
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
27702770
In order to update the number of files allowed for a Dataset, without causing a Draft version of the Dataset being created, the following API can be used
27712771

2772-
.. note:: To clear the limit simply set the limit to -1 or call the DELETE API.
2772+
.. note:: To clear the limit simply set the limit to -1 or call the DELETE API (Must be a superuser to change this value).
27732773

27742774
.. code-block:: bash
27752775

doc/sphinx-guides/source/api/search.rst

Lines changed: 62 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,19 @@ type string Can be either "dataverse", "dataset", or "file". Multi
2929
subtree string The identifier of the Dataverse collection to which the search should be narrowed. The subtree of this Dataverse collection and all its children will be searched. Multiple "subtree" parameters can be used to include multiple Dataverse collections. For example, https://demo.dataverse.org/api/search?q=data&subtree=birds&subtree=cats .
3030
sort string The sort field. Supported values include "name" and "date". See example under "order".
3131
order string The order in which to sort. Can either be "asc" or "desc". For example, https://demo.dataverse.org/api/search?q=data&sort=name&order=asc
32-
per_page int The number of results to return per request. The default is 10. The max is 1000. See :ref:`iteration example <iteration-example>`.
32+
per_page int The number of results to return per request. The default is 10. The max is 1000. See :ref:`iteration example <iteration-example>`.
3333
start int A cursor for paging through search results. See :ref:`iteration example <iteration-example>`.
3434
show_relevance boolean Whether or not to show details of which fields were matched by the query. False by default. See :ref:`advanced search example <advancedsearch-example>`.
3535
show_facets boolean Whether or not to show facets that can be operated on by the "fq" parameter. False by default. See :ref:`advanced search example <advancedsearch-example>`.
3636
fq string A filter query on the search term. Multiple "fq" parameters can be used. See :ref:`advanced search example <advancedsearch-example>`.
3737
show_entity_ids boolean Whether or not to show the database IDs of the search results (for developer use).
38-
geo_point string Latitude and longitude in the form ``geo_point=42.3,-71.1``. You must supply ``geo_radius`` as well. See also :ref:`geospatial-search`.
39-
geo_radius string Radial distance in kilometers from ``geo_point`` (which must be supplied as well) such as ``geo_radius=1.5``.
40-
metadata_fields string Includes the requested fields for each dataset in the response. Multiple "metadata_fields" parameters can be used to include several fields. The value must be in the form "{metadata_block_name}:{field_name}" to include a specific field from a metadata block (see :ref:`example <dynamic-citation-some>`) or "{metadata_field_set_name}:\*" to include all the fields for a metadata block (see :ref:`example <dynamic-citation-all>`). "{field_name}" cannot be a subfield of a compound field. If "{field_name}" is a compound field, all subfields are included.
38+
show_api_urls boolean Whether or not to show API URLs for the search results
39+
query_entities boolean Whether to query entities for extra metadata (slower). Default is true.
40+
metadata_fields string Includes the requested fields for each dataset in the response. Multiple "metadata_fields" parameters can be used to include several fields. The value must be in the form "{metadata_block_name}:{field_name}" to include a specific field from a metadata block (see :ref:`example <dynamic-citation-some>`) or "{metadata_field_set_name}:\*" to include all the fields for a metadata block (see :ref:`example <dynamic-citation-all>`). "{field_name}" cannot be a subfield of a compound field. If "{field_name}" is a compound field, all subfields are included.
41+
geo_point string Latitude and longitude in the form ``geo_point=42.3,-71.1``. You must supply ``geo_radius`` as well. See also :ref:`geospatial-search`.
42+
geo_radius string Radial distance in kilometers from ``geo_point`` (which must be supplied as well) such as ``geo_radius=1.5``.
4143
show_type_counts boolean Whether or not to include total_count_per_object_type for types: Dataverse, Dataset, and Files.
44+
search_service string The name of the search service to use for this query. If omitted, the default search service will be used. For available search services, see :ref:`discovering-available-search-services`.
4245
================ ======= ===========
4346

4447
Basic Search Example
@@ -775,3 +778,58 @@ Output from iteration example
775778
<span class="label label-success pull-right">
776779
CORS
777780
</span>
781+
782+
.. _search-services:
783+
784+
Search Services
785+
---------------
786+
787+
.. _discovering-available-search-services:
788+
789+
Discovering Available Search Services
790+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
791+
792+
To discover available search services and their capabilities, you can use the Search Services API endpoint.
793+
Note: Configurable Search Services are an optional, experimental feature than may evolve faster than other parts of Dataverse.
794+
795+
Example API endpoint: https://demo.dataverse.org/api/search/services
796+
797+
This endpoint returns a list of available search services, including their names, and display names. It also indicates the default search service.
798+
799+
Example response:
800+
801+
.. code-block:: json
802+
803+
{
804+
"status": "OK",
805+
"data": {
806+
"services": [
807+
{
808+
"name": "solr",
809+
"displayName": "Solr Search",
810+
},
811+
{
812+
"name": "externalSearch",
813+
"displayName": "External Search for Datasets",
814+
}
815+
],
816+
"defaultService": "solr"
817+
}
818+
819+
You can use the ``name`` values returned by this endpoint in the ``search_service`` parameter of the main search API to specify which search service to use for a particular query.
820+
821+
Using Different Search Services
822+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
823+
824+
To use a specific search service, include the ``search_service`` parameter in your search query and pass the ``name``. For example:
825+
826+
https://demo.dataverse.org/api/search?q=trees&search_service=externalSearch
827+
828+
This query will use the ``externalSearch`` service (assuming it exists) instead of the default search service (``solr``).
829+
830+
.. note:: Other search services may not be complete replacements for the included ``solr`` service. For example, they may not support searching for collections or files (just datasets).
831+
832+
Developing Search Services
833+
~~~~~~~~~~~~~~~~~~~~~~~~~~
834+
835+
See :doc:`/developers/search-services` in the Developer Guide.

doc/sphinx-guides/source/developers/dev-environment.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ After cloning the `dataverse repo <https://github.com/IQSS/dataverse>`_, run thi
1818

1919
``mvn -Pct clean package docker:run``
2020

21+
(Note that if you are Windows, you must run the command above in `WSL <https://learn.microsoft.com/windows/wsl>`_ rather than cmd.exe. See :doc:`windows`.)
22+
2123
After some time you should be able to log in:
2224

2325
- url: http://localhost:8080

doc/sphinx-guides/source/developers/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,5 @@ Developer Guide
4646
workflows
4747
fontcustom
4848
classic-dev-env
49+
search-services
4950

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
Search Services
2+
===============
3+
4+
Dataverse supports configurable search services, allowing developers to integrate additional search engines dynamically. This guide outlines the design and provides details on how to use the interfaces and classes involved.
5+
6+
Design Overview
7+
---------------
8+
The configurable search services feature is designed to allow:
9+
10+
1. Dynamic addition of new search engines
11+
2. Configuration of the Dataverse UI to use a specified search engine
12+
3. Use of different search engines via the API
13+
4. Discovery of installed search engines
14+
15+
Key Components
16+
--------------
17+
18+
1. SearchService Interface
19+
^^^^^^^^^^^^^^^^^^^^^^^^^^
20+
The ``SearchService`` interface is the core of the configurable search services. It defines the methods that any search engine implementation must provide. (The methods below are accurate as of this writing.)
21+
22+
.. code-block:: java
23+
24+
public interface SearchService {
25+
String getServiceName();
26+
String getDisplayName();
27+
28+
SolrQueryResponse search(DataverseRequest dataverseRequest, List<Dataverse> dataverses, String query,
29+
List<String> filterQueries, String sortField, String sortOrder, int paginationStart,
30+
boolean onlyDatatRelatedToMe, int numResultsPerPage, boolean retrieveEntities, String geoPoint,
31+
String geoRadius, boolean addFacets, boolean addHighlights) throws SearchException;
32+
33+
default void setSolrSearchService(SearchService solrSearchService);
34+
}
35+
36+
The interface allows you to provide a service name and display name, and to respond to the same search parameters that are normally sent to the Solr search engine.
37+
38+
The ``setSolrSearchService`` method is used by Dataverse to give your class a reference to the ``SolrSearchService``, allowing your class to perform Solr queries as needed. (See the ``ExternalSearchServices`` for an example.)
39+
40+
2. ConfigurableSearchService Interface
41+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
42+
43+
The ``ConfigurableSearchService`` interface extends the ``SearchService`` interface and adds a method for Dataverse to set the ``SettingsServiceBean``. This allows search services to be configurable through Dataverse settings.
44+
45+
.. code-block:: java
46+
47+
public interface ConfigurableSearchService extends SearchService {
48+
void setSettingsService(SettingsServiceBean settingsService);
49+
}
50+
51+
The ``GetExternalSearchServiceBean`` and ``PostExternalSearchServiceBean`` classes provide a use case for this.
52+
53+
3. JVM Options for Search Configuration
54+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
55+
Dataverse uses two JVM options to configure the search functionality:
56+
57+
- ``dataverse.search.services.directory``: Specifies the local directory where jar files with search engines (classes implementing the ``SearchService`` interface) can be found. Dataverse will dynamically load engines from this directory.
58+
59+
- ``dataverse.search.default-service``: The ``serviceName`` of the service that should be used in the Dataverse UI.
60+
61+
Example configuration:
62+
63+
.. code-block:: bash
64+
65+
./asadmin create-jvm-options "-Ddataverse.search.services.directory=/var/lib/dataverse/searchServices"
66+
./asadmin create-jvm-options "-Ddataverse.search.default-service=solr"
67+
68+
Remember to restart your Payara server after modifying these JVM options for the changes to take effect.
69+
70+
4. Using Different Search Engines via API
71+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
72+
73+
The loaded search services can be discovered using the ``/api/search/services`` endpoint.
74+
75+
Queries can be made to different engines by including the optional ``search_service=<serviceName>`` query parameter.
76+
77+
Use of these endpoints is described for end users in the API Guide under :ref:`search-services`.
78+
79+
Available Search Services
80+
-------------------------
81+
82+
The class definitions for four example search services are included in the Dataverse repository.
83+
They are not included in the Dataverse .war file but can be built as three separate .jar files using
84+
85+
.. code-block:: bash
86+
87+
mvn clean package -DskipTests=true -Pexternal-search-get -Pexternal-search-post
88+
89+
or
90+
91+
.. code-block:: bash
92+
93+
mvn clean package -DskipTests=true -Ptrivial-search-examples
94+
95+
1. GetExternalSearchServiceBean
96+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
97+
98+
2. PostExternalSearchServiceBean
99+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
100+
101+
These classes implement the ``ConfigurableSearchService`` interface.
102+
They make a GET or POST call (respectively) to an external search engine that must return a JSON array of objects with "PID" (preferred) or "DOI" and "Distance" keys.
103+
The query sent to the external engine use the same query parameters as the Dataverse search API (GET) or have a JSON payload with those keys (POST).
104+
The results they return are then searched for using the solr search engine which enforces access control and provides the standard formatting expected by the Dataverse UI and API.
105+
The distance values are used to order the results, smallest distances first.
106+
107+
They can be configured via two settings each:
108+
109+
- GET
110+
111+
- :GetExternalSearchUrl - the URL to send GET search queries to
112+
- :GetExternalSearchName - the display name to use for this configuration
113+
114+
- POST
115+
116+
- :PostExternalSearchUrl - the URL to send POST search queries to
117+
- :PostExternalSearchName - the display name to use for this configuration
118+
119+
As these classes use PIDs as identifiers, they cannot reference collections or, unless file PIDs are enabled, files.
120+
Similar classes, or extensions of these classes could search by database ids instead, etc. to support the additional types.
121+
122+
3. GoldenOldiesSearchServiceBean
123+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124+
125+
4. OddlyEnoughSearchServiceBean
126+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
127+
128+
These classes implement the ``SearchService`` interface.
129+
They are intended only as code examples and simple tests of the design and are not intended for production use.
130+
The former simply replaces the user query with a query for entities with a db id < 1000. It demonstrates how a class can leverage the solr engine and achieve results solely by modifying/replacing the user query.
131+
The latter only returns hits from the user's query that also have an odd database id. Since the filtering in the class changes the number of total hits available and pagination, this class demonstrates one way a developer can adjust those aspects of the Solr response.
132+
133+
Notes
134+
-----
135+
136+
1. Unless you use the Solr engine to provide access control, you must implement proper access control in your search engine
137+
2. The design currently limits search results to be in the format returned by Solr and the hits are expected to be collections, datasets, or files - other classes are not supported.
138+
3. Search services could be designed to completely replace Solr or to just support certain use cases (e.g. the external search classes only handling datasets).
139+
4. While search services can be deployed as independent jar files, they currently import multiple Dataverse classes and, unlike exporters, cannot be built using just the Dataverse SPI.
140+
5. As with other experimental features, we expect the ``SearchService`` interface may change over time as we learn about how people use it. Please keep in touch if you are developing search services.
141+

0 commit comments

Comments
 (0)