Skip to content

Commit ce72678

Browse files
authored
Additional filter options in GN 4.x harvesters (geonetwork#9100)
Add configuration filter options for categories, metadata standard name, and group owners in GeoNetwork 4.x harvester.
1 parent 657fbf7 commit ce72678

File tree

8 files changed

+186
-54
lines changed

8 files changed

+186
-54
lines changed

core/src/main/java/org/fao/geonet/kernel/setting/HarvesterSettingsManager.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ public String add(String path, Object name, Object value) {
221221

222222
public String add(String path, Object name, Object value, boolean encrypted) {
223223

224-
if (name == null)
224+
if (name == null)
225225
throw new IllegalArgumentException("Name cannot be null");
226226

227227
String sName = makeString(name);
Lines changed: 58 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,58 @@
1-
# GeoNetwork 4.X Harvester {#gn4_harvester}
2-
3-
GeoNetwork 4.x changed the search engine to Elasticsearch, which is not compatible with previous versions. To harvest a catalogue based on GeoNetwork 4.x, this harvesting type is required.
4-
5-
## Adding a GeoNetwork 4.x Harvester
6-
7-
To create a GeoNetwork 4.X harvester, go to `Admin console` > `Harvesting` and select `Harvest from` > `GeoNetwork (from 4.x)`:
8-
9-
![](img/add-geonetwork-3-harvester.png)
10-
11-
Provide the following information:
12-
- **Identification** - Options describing the remote site.
13-
- *Name* - A short description of the remote site. It will be shown on the harvesting main page as the name for this instance of the harvester.
14-
- *Group* - Group that owns the harvested metadata.
15-
- *User* - User that owns the harvested metadata.
16-
- **Schedule** - Schedule configuration to execute the harvester.
17-
- **Configure connection to GeoNetwork**:
18-
- *Catalog URL* - The URL of the GeoNetwork server from which metadata will be harvested.
19-
- *Node name* - GeoNetwork node name to harvest, by default `srv`.
20-
- *Search filter* - This allows you to select metadata records for harvest based on certain criteria:
21-
- *Full text*
22-
- *Title*
23-
- *Abstract*
24-
- *Keyword*
25-
- *Catalog* - Allows you to select a source to filter the metadata to harvest.
26-
27-
- **Configure response processing**
28-
- *Action on UUID collision* - Allows you to configure the action when a harvester finds the same UUID on a record collected by another method (another harvester, importer, dashboard editor, etc.).
29-
- skipped (default)
30-
- overridden
31-
- generate a new UUID
32-
- *Remote authentication*
33-
- *Use full MEF format*
34-
- *Use change date for comparison*
35-
- *Set category if it exists locally*
36-
- *Category for harvested records*
37-
- *XSL filter name to apply*
38-
- *Validate records before import*
39-
40-
- **Privileges** - Assign privileges to harvested metadata.
1+
# GeoNetwork 4.x Harvester {#gn4_harvester}
2+
3+
GeoNetwork 4.x uses **Elasticsearch** as its search engine, breaking compatibility with previous protocols
4+
from older GeoNetwork releases. To harvest metadata from a GeoNetwork 4.x catalogue, you must use the
5+
dedicated GeoNetwork 4.x harvester type.
6+
7+
---
8+
9+
## How to Add a GeoNetwork 4.x Harvester
10+
11+
1. **Go to**: `Admin console` > `Harvesting`
12+
2. **Select**: `Harvest from` > `GeoNetwork (from 4.x)`
13+
3. **Complete the configuration panels as described below**
14+
15+
![](img/add-geonetwork-4-harvester.png)
16+
17+
### Identification
18+
19+
- **Name**: Provide a short, descriptive name for your remote GeoNetwork instance. This name will appear in the main page as the name for this instance of the harvester.
20+
- **Group**: Assign the group that will own the harvested metadata records.
21+
- **User**: Select the user (owner) for the harvested records.
22+
23+
### Schedule
24+
25+
- **Schedule**: Set how often the harvester should run (e.g., daily, weekly), using cron syntax or the built-in scheduler.
26+
You may also run the harvester manually from the interface.
27+
28+
### Connection to GeoNetwork
29+
30+
- **Catalog URL**: Enter the full URL of the remote GeoNetwork 4.x server.
31+
- **Node name**: Specify the catalogue node to harvest (typically `srv`).
32+
- **Search filter**: Define filters to limit harvested metadata using criteria like:
33+
- **Full text**: Matches all text fields.
34+
- **Title**: Filter by title.
35+
- **Abstract**: Filter by abstract.
36+
- **Keyword**: Filter by keyword.
37+
- **Categories**: Combine multiple categories with `AND` or `OR` (e.g., `cat1 AND cat2`, `cat1 OR cat2`).
38+
- **Metadata standard**: Specify the metadata standards (e.g., `iso19139 OR iso19115-3.2018`).
39+
- **Groups**: List one or more groups (owners of the metadata) numeric IDs, comma-separated.
40+
- **Catalog**: Identify a source sub-catalogue, if needed.
41+
42+
### Response Processing
43+
44+
- **Action on UUID collision**: Choose how to configure the action when a harvester finds the same UUID on a record collected by another method (another harvester, importer, dashboard editor, etc.):
45+
- **Skipped** (default): Leave duplicates unchanged.
46+
- **Overridden**: Replace with new record.
47+
- **Generate a new UUID**: Assign a new unique identifier and keep both and old records.
48+
- **Remote Authentication**: Configure if credentials are required for the source catalogue.
49+
- **Use Full MEF Format**: Enable to transfer all metadata fields and resources.
50+
- **Use Change Date for Comparison**: Updates only if the change date differs.
51+
- **Set Category If Exists Locally**: Assign categories if matched in local catalogue.
52+
- **Category for Harvested Records**: Set the default category for imported entries.
53+
- **XSL Filter Name to Apply**: Specify custom XSL transformations if needed.
54+
- **Validate Records Before Import**: Toggle to validate metadata against the expected schema before import.
55+
56+
### Privileges
57+
58+
- Assign viewing, editing, or publishing privileges for harvested records.
40.7 KB
Loading

harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/v4/Geonet40Harvester.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,9 @@ protected void storeNodeExtra(GeonetParams params, String path,
6666
harvesterSettingsManager.add("id:" + searchID, "abstract", s.abstractText);
6767
harvesterSettingsManager.add("id:" + searchID, "keywords", s.keywords);
6868
harvesterSettingsManager.add("id:" + searchID, "sourceUuid", s.sourceUuid);
69+
harvesterSettingsManager.add("id:" + searchID, "categories", s.categories);
70+
harvesterSettingsManager.add("id:" + searchID, "schemes", s.schemes);
71+
harvesterSettingsManager.add("id:" + searchID, "groupOwners", s.groupOwners);
6972
}
7073

7174
//--- store group mapping

harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/v4/Search.java

Lines changed: 57 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,15 @@
2323

2424
package org.fao.geonet.kernel.harvest.harvester.geonet.v4;
2525

26+
import com.fasterxml.jackson.core.JsonProcessingException;
27+
import com.fasterxml.jackson.databind.ObjectMapper;
28+
import java.util.ArrayList;
29+
import java.util.Arrays;
30+
import java.util.List;
31+
import java.util.stream.Collectors;
2632
import org.apache.commons.lang3.StringUtils;
2733
import org.apache.commons.lang3.builder.ToStringBuilder;
34+
import org.fao.geonet.Util;
2835
import org.fao.geonet.constants.Geonet;
2936
import org.fao.geonet.exceptions.BadParameterEx;
3037
import org.fao.geonet.kernel.harvest.harvester.geonet.BaseSearch;
@@ -39,12 +46,19 @@
3946
*/
4047
class Search extends BaseSearch {
4148

49+
public String categories;
50+
public String schemes;
51+
public String groupOwners;
52+
4253
public Search() {
4354
super();
4455
}
4556

4657
public Search(Element search) throws BadParameterEx {
4758
super(search);
59+
categories = Util.getParam(search, "categories", "");
60+
schemes = Util.getParam(search, "schemes", "");
61+
groupOwners = Util.getParam(search, "groupOwners", "");
4862
}
4963

5064
public static Search createEmptySearch(int from, int to) throws BadParameterEx {
@@ -63,6 +77,9 @@ public Search copy() {
6377
s.sourceUuid = sourceUuid;
6478
s.from = from;
6579
s.to = to;
80+
s.categories = categories;
81+
s.groupOwners = groupOwners;
82+
s.schemes = schemes;
6683

6784
return s;
6885
}
@@ -76,36 +93,60 @@ public Search copy() {
7693
* @return A string representation of the Elasticsearch query formatted as a JSON object.
7794
*/
7895
public String createElasticsearchQuery() {
79-
String sourceFilter = "";
96+
97+
List<String> filters = new ArrayList<>();
98+
8099
if (StringUtils.isNotBlank(sourceUuid)) {
81-
sourceFilter = String.format(",{\"term\": {\"sourceCatalogue\": \"%s\"}}", sourceUuid);
100+
filters.add(String.format("{\"term\": {\"sourceCatalogue\": \"%s\"}}", sourceUuid));
82101
}
83102

84-
String freeTextFilter = "";
85103
if (StringUtils.isNotBlank(freeText)) {
86-
freeTextFilter = String.format(",{\"query_string\": {\"query\": \"(any.\\\\*:(%s) OR any.common:(%s))\", \"default_operator\": \"AND\"}}", freeText, freeText);
104+
filters.add(String.format("{\"query_string\": {\"query\": \"(any.\\\\*:(%s) OR any.common:(%s))\", \"default_operator\": \"AND\"}}", freeText, freeText));
87105
}
88106

89-
String titleFilter = "";
90107
if (StringUtils.isNotBlank(title)) {
91-
titleFilter = String.format(",{\"query_string\": {\"query\": \"(resourceTitleObject.\\\\*:(%s))\", \"default_operator\": \"AND\"}}", title);
108+
filters.add(String.format("{\"query_string\": {\"query\": \"(resourceTitleObject.\\\\*:(%s))\", \"default_operator\": \"AND\"}}", title));
92109
}
93110

94-
String abstractFilter = "";
95111
if (StringUtils.isNotBlank(abstractText)) {
96-
abstractFilter = String.format(",{\"query_string\": {\"query\": \"(resourceAbstractObject.\\\\*:(%s))\", \"default_operator\": \"AND\"}}", abstractText);
112+
filters.add(String.format("{\"query_string\": {\"query\": \"(resourceAbstractObject.\\\\*:(%s))\", \"default_operator\": \"AND\"}}", abstractText));
97113
}
98114

99-
String keywordFilter = "";
100115
if (StringUtils.isNotBlank(keywords)) {
101-
abstractFilter = String.format(",{\"term\": {\"tag.default\": \"%s\"}}", keywords);
116+
filters.add(String.format("{\"term\": {\"tag.default\": \"%s\"}}", keywords));
117+
}
118+
119+
if (StringUtils.isNotBlank(categories)) {
120+
filters.add(String.format("{\"query_string\": {\"query\": \"cat:(%s)\"}}", categories));
121+
}
122+
123+
if (StringUtils.isNotBlank(schemes)) {
124+
filters.add(String.format("{\"query_string\": {\"query\": \"documentStandard:(%s)\", \"default_operator\": \"OR\"}}", schemes));
125+
}
126+
127+
if (StringUtils.isNotBlank(groupOwners)) {
128+
try {
129+
List<String> groupOwners = Arrays.stream(this.groupOwners.split(","))
130+
.map(String::trim) // Remove extra spaces
131+
.collect(Collectors.toList());
132+
String groupOwnersJson = new ObjectMapper().writeValueAsString(groupOwners); // Outputs: ["group1","group2"]
133+
filters.add(String.format("{\"terms\": {\"groupOwner\": %s}}", groupOwnersJson));
134+
135+
} catch (JsonProcessingException e) {
136+
Log.debug(Geonet.HARVEST_MAN, "Error creating criteria for ownerGroup. Ignoring this filter.");
137+
}
138+
}
139+
140+
String queryFilter = String.join(",", filters);
141+
if (StringUtils.isNotBlank(queryFilter)) {
142+
queryFilter = "," + queryFilter;
102143
}
103144

104145
String queryBody = String.format("{\n" +
105146
" \"from\": %d,\n" +
106147
" \"size\": %d,\n" +
107148
" \"sort\": [\"_score\"],\n" +
108-
" \"query\": {\"bool\": {\"must\": [{\"terms\": {\"isTemplate\": [\"n\"]}}%s%s%s%s%s]}},\n" +
149+
" \"query\": {\"bool\": {\"must\": [{\"terms\": {\"isTemplate\": [\"n\"]}}%s]}},\n" +
109150
" \"_source\": {\"includes\": [\n" +
110151
" \"uuid\",\n" +
111152
" \"id\",\n" +
@@ -115,7 +156,8 @@ public String createElasticsearchQuery() {
115156
" \"documentStandard\"\n" +
116157
" ]},\n" +
117158
" \"track_total_hits\": true\n" +
118-
"}", from, to, sourceFilter, freeTextFilter, titleFilter, abstractFilter, keywordFilter);
159+
"}",
160+
from, to, queryFilter);
119161

120162

121163
if (Log.isDebugEnabled(Geonet.HARVEST_MAN)) {
@@ -139,6 +181,9 @@ public String toString() {
139181
.append("title", title)
140182
.append("abstrac", abstractText)
141183
.append("keywords", keywords)
184+
.append("categories", categories)
185+
.append("schemes", schemes)
186+
.append("groupOwners", groupOwners)
142187
.append("sourceUuid", sourceUuid)
143188
.toString();
144189
}

web-ui/src/main/resources/catalog/templates/admin/harvest/type/geonetwork40.html

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,57 @@
120120
</div>
121121
</div>
122122

123+
<div id="gn-harvest-settings-gn4-basic-filter-category-row">
124+
<label
125+
id="gn-harvest-settings-gn4-basic-filter-category-label"
126+
class="control-label col-lg-4"
127+
data-translate=""
128+
>categories</label
129+
>
130+
<div class="col-lg-8">
131+
<input
132+
id="gn-harvest-settings-gn4-basic-filter-category-input"
133+
type="text"
134+
class="form-control"
135+
data-ng-model="harvesterSelected.searches[0].categories"
136+
/>
137+
</div>
138+
</div>
139+
140+
<div id="gn-harvest-settings-gn4-basic-filter-scheme-row">
141+
<label
142+
id="gn-harvest-settings-gn4-basic-filter-scheme-label"
143+
class="control-label col-lg-4"
144+
data-translate=""
145+
>documentStandard</label
146+
>
147+
<div class="col-lg-8">
148+
<input
149+
id="gn-harvest-settings-gn4-basic-filter-scheme-input"
150+
type="text"
151+
class="form-control"
152+
data-ng-model="harvesterSelected.searches[0].schemes"
153+
/>
154+
</div>
155+
</div>
156+
157+
<div id="gn-harvest-settings-gn4-basic-filter-groupOwners-row">
158+
<label
159+
id="gn-harvest-settings-gn4-basic-filter-groupOwners-label"
160+
class="control-label col-lg-4"
161+
data-translate=""
162+
>groupOwners</label
163+
>
164+
<div class="col-lg-8">
165+
<input
166+
id="gn-harvest-settings-gn4-basic-filter-groupOwners-input"
167+
type="text"
168+
class="form-control"
169+
data-ng-model="harvesterSelected.searches[0].groupOwners"
170+
/>
171+
</div>
172+
</div>
173+
123174
<div id="gn-harvest-settings-gn4-basic-source-row">
124175
<label
125176
id="gn-harvest-settings-gn4-basic-source-label"

web-ui/src/main/resources/catalog/templates/admin/harvest/type/geonetwork40.js

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,10 @@ var gnHarvestergeonetwork40 = {
4141
"source": {
4242
"uuid": [],
4343
"name": []
44-
}
44+
},
45+
"categories": "",
46+
"schemes": "",
47+
"groupOwners": ""
4548
}],
4649
"ifRecordExistAppendPrivileges": false,
4750
"privileges": [{
@@ -87,6 +90,9 @@ var gnHarvestergeonetwork40 = {
8790
+ ' <title>' + ((h.searches[0] && h.searches[0].title) || '') + '</title>'
8891
+ ' <abstract>' + ((h.searches[0] && h.searches[0]['abstract']) || '') + '</abstract>'
8992
+ ' <keywords>' + ((h.searches[0] && h.searches[0].keywords) || '') + '</keywords>'
93+
+ ' <categories>' + ((h.searches[0] && h.searches[0].categories) || '') + '</categories>'
94+
+ ' <schemes>' + ((h.searches[0] && h.searches[0].schemes) || '') + '</schemes>'
95+
+ ' <groupOwners>' + ((h.searches[0] && h.searches[0].groupOwners) || '') + '</groupOwners>'
9096
+ ' <source>'
9197
+ ' <uuid>' + ((h.searches[0] && h.searches[0].source.uuid) || '') + '</uuid>'
9298
+ ' <name/>'

web/src/main/webapp/xsl/xml/harvesting/geonetwork40.xsl

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,15 @@
4949
<keywords>
5050
<xsl:value-of select="children/keywords/value"/>
5151
</keywords>
52+
<categories>
53+
<xsl:value-of select="children/categories/value"/>
54+
</categories>
55+
<schemes>
56+
<xsl:value-of select="children/schemes/value"/>
57+
</schemes>
58+
<groupOwners>
59+
<xsl:value-of select="children/groupOwners/value"/>
60+
</groupOwners>
5261
<source>
5362
<uuid>
5463
<xsl:value-of select="children/sourceUuid/value"/>

0 commit comments

Comments
 (0)