Skip to content

Commit 11fe58d

Browse files
authored
Database harvester supporting PostgreSQL and Oracle JDBC connections (#8247)
* Database harvester * Database harvester - Define MetadataRepository as class member and use IMetadataUtils class member instead of context.getBean * Database harvester - update documentation to document the field types supported * Database harvester - support field filter operator * Database harvester - update to latest GeoNetwork version
1 parent 67928de commit 11fe58d

File tree

20 files changed

+1598
-37
lines changed

20 files changed

+1598
-37
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Database Harvesting {#database_harvester}
2+
3+
This harvesting type uses a database connection to harvest metadata stored in a database table.
4+
5+
## Adding a Database harvester
6+
7+
To create a Database harvester go to `Admin console` > `Harvesting` and select `Harvest from` > `Database`:
8+
9+
![](img/add-database-harvester.png)
10+
11+
Providing the following information:
12+
13+
- **Identification**
14+
- *Node name and logo*: A unique name for the harvester and optionally a logo to assign to the harvester.
15+
- *Group*: Group which owns the harvested records. Only the catalog administrator or users with the profile `UserAdmin` of this group can manage the harvester.
16+
- *User*: User who owns the harvested records.
17+
18+
- **Schedule**: Scheduling options to execute the harvester. If disabled, the harvester should be executed manually from the harvesters page. If enabled a schedule expression using cron syntax should be configured ([See examples](https://www.quartz-scheduler.org/documentation/quartz-2.1.7/tutorials/crontrigger)).
19+
20+
- **Configure connection to Database**
21+
- *Server*: The database server IP/Hostname.
22+
- *Port*: The database port. For example, for Postgres usually 5432.
23+
- *Database name*: The database name to connect.
24+
- *Table name*: Table name with the metadata. The name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
25+
- *Metadata field name*: Table field name that contains the metadata XML text. The field name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
26+
The supported SQL generic types for this field are the following: `BLOB`, `LONGVARBINARY`, `LONGNVARCHAR`, `LONGVARCHAR`, `VARCHAR`, `SQLXML`. Check your database documentation for specific implementations of these generic types.
27+
- *Database type*: Database type. Currently supported Postgres and Oracle.
28+
- *Remote authentication*: Credentials to connect to the database.
29+
30+
- **Search filter**: allows to define a simple field condition to filter the results.
31+
- *Filter field*: Table field name used to filter the results. The name must begin with a letter (a-z) or underscore (_). Subsequent characters in a name can be letters, digits (0-9), or underscores.
32+
- *Filter value*: Value to filter the results. It can contain wildcards (%).
33+
- *Filter operator*: Supported values are `LIKE` and `NOT LIKE`.
34+
35+
- **Configure response processing for database**
36+
- *Action on UUID collision*: When a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overriden or generate a new UUID?
37+
- *Validate records before import*: If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
38+
- *XSL filter name to apply*: (Optional) The XSL filter is applied to each metadata record. The filter is a process which depends on the metadata schema (see the `process` folder of the metadata schemas).
39+
40+
It could be composed of parameter which will be sent to XSL transformation using the following syntax: `anonymizer?protocol=MYLOCALNETWORK:FILEPATH&email=gis@organisation.org&thesaurus=MYORGONLYTHEASURUS`
41+
42+
- *Batch edits*: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element.
43+
- *Translate metadata content*: (Optional) Allows to translate metadata elements. It requires a translation service provider configured in the System settings.
44+
45+
- **Privileges** - Assign privileges to harvested metadata.
8.09 KB
Loading

docs/manual/docs/user-guide/harvesting/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ The following sources can be harvested:
1818
- [GeoPortal REST Harvesting](harvesting-geoportal.md)
1919
- [THREDDS Harvesting](harvesting-thredds.md)
2020
- [WFS GetFeature Harvesting](harvesting-wfs-features.md)
21+
- [Database Harvesting](harvesting-database.md)
2122

2223
## Mechanism overview
2324

docs/manual/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,7 @@ nav:
306306
- user-guide/harvesting/harvesting-thredds.md
307307
- user-guide/harvesting/harvesting-webdav.md
308308
- user-guide/harvesting/harvesting-wfs-features.md
309+
- user-guide/harvesting/harvesting-database.md
309310
- user-guide/export/index.md
310311
- 'Administration':
311312
- administrator-guide/index.md

harvesters/src/main/java/org/fao/geonet/kernel/harvest/BaseAligner.java

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import org.fao.geonet.domain.AbstractMetadata;
2929
import org.fao.geonet.domain.MetadataCategory;
3030
import org.fao.geonet.kernel.DataManager;
31+
import org.fao.geonet.kernel.GeonetworkDataDirectory;
3132
import org.fao.geonet.kernel.SchemaManager;
3233
import org.fao.geonet.kernel.datamanager.IMetadataManager;
3334
import org.fao.geonet.kernel.harvest.harvester.AbstractHarvester;
@@ -199,4 +200,38 @@ public Element translateMetadataContent(ServiceContext context,
199200
return md;
200201
}
201202

203+
204+
/**
205+
* Filter the metadata if process parameter is set and corresponding XSL transformation
206+
* exists in xsl/conversion/import.
207+
*
208+
* @param context
209+
* @param md
210+
* @param processName
211+
* @param processParams
212+
* @param log
213+
* @return
214+
*/
215+
protected Element applyXSLTProcessToMetadata(ServiceContext context,
216+
Element md,
217+
String processName,
218+
Map<String, Object> processParams,
219+
org.fao.geonet.Logger log) {
220+
Path filePath = context.getBean(GeonetworkDataDirectory.class).getXsltConversion(processName);
221+
if (!Files.exists(filePath)) {
222+
log.debug(" processing instruction " + processName + ". Metadata not filtered.");
223+
} else {
224+
Element processedMetadata;
225+
try {
226+
processedMetadata = Xml.transform(md, filePath, processParams);
227+
log.debug(" metadata filtered.");
228+
md = processedMetadata;
229+
} catch (Exception e) {
230+
log.warning(" processing error " + processName + ": " + e.getMessage());
231+
}
232+
}
233+
return md;
234+
}
235+
236+
202237
}

harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/AbstractHarvester.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,9 @@
4545
import org.fao.geonet.exceptions.UnknownHostEx;
4646
import org.fao.geonet.kernel.DataManager;
4747
import org.fao.geonet.kernel.MetadataIndexerProcessor;
48+
import org.fao.geonet.kernel.datamanager.IMetadataIndexer;
4849
import org.fao.geonet.kernel.datamanager.IMetadataManager;
50+
import org.fao.geonet.kernel.datamanager.IMetadataSchemaUtils;
4951
import org.fao.geonet.kernel.datamanager.IMetadataUtils;
5052
import org.fao.geonet.kernel.harvest.Common.OperResult;
5153
import org.fao.geonet.kernel.harvest.Common.Status;
@@ -133,6 +135,8 @@ public abstract class AbstractHarvester<T extends HarvestResult, P extends Abstr
133135
protected DataManager dataMan;
134136
protected IMetadataManager metadataManager;
135137
protected IMetadataUtils metadataUtils;
138+
protected IMetadataSchemaUtils metadataSchemaUtils;
139+
protected IMetadataIndexer metadataIndexer;
136140

137141
protected P params;
138142
protected T result;
@@ -173,6 +177,8 @@ protected void setContext(ServiceContext context) {
173177
this.harvesterSettingsManager = context.getBean(HarvesterSettingsManager.class);
174178
this.settingManager = context.getBean(SettingManager.class);
175179
this.metadataManager = context.getBean(IMetadataManager.class);
180+
this.metadataSchemaUtils = context.getBean(IMetadataSchemaUtils.class);
181+
this.metadataIndexer = context.getBean(IMetadataIndexer.class);
176182
}
177183

178184
public void add(Element node) throws BadInputEx, SQLException {

harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/csw/Aligner.java

Lines changed: 4 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ private void addMetadata(RecordInfo ri, String uuidToAssign) throws Exception {
310310
// use that uuid (newMdUuid) for the new metadata to add to the catalogue.
311311
String newMdUuid = null;
312312
if (!params.xslfilter.equals("")) {
313-
md = processMetadata(context, md, processName, processParams);
313+
md = applyXSLTProcessToMetadata(context, md, processName, processParams, log);
314314
schema = dataMan.autodetectSchema(md);
315315
// Get new uuid if modified by XSLT process
316316
newMdUuid = metadataUtils.extractUUID(schema, md);
@@ -463,7 +463,7 @@ boolean updatingLocalMetadata(RecordInfo ri, String id, boolean force) throws Ex
463463

464464
boolean updateSchema = false;
465465
if (!params.xslfilter.equals("")) {
466-
md = processMetadata(context, md, processName, processParams);
466+
md = applyXSLTProcessToMetadata(context, md, processName, processParams, log);
467467
String newSchema = dataMan.autodetectSchema(md);
468468
updateSchema = !newSchema.equals(schema);
469469
schema = newSchema;
@@ -487,9 +487,11 @@ boolean updatingLocalMetadata(RecordInfo ri, String id, boolean force) throws Ex
487487
metadata.getHarvestInfo().setUuid(params.getUuid());
488488
metadata.getSourceInfo().setSourceId(params.getUuid());
489489
}
490+
490491
if (updateSchema) {
491492
metadata.getDataInfo().setSchemaId(schema);
492493
}
494+
493495
metadataManager.save(metadata);
494496
}
495497

@@ -624,36 +626,6 @@ private boolean foundDuplicateForResource(String uuid, Element response) {
624626
return false;
625627
}
626628

627-
/**
628-
* Filter the metadata if process parameter is set and corresponding XSL transformation
629-
* exists in xsl/conversion/import.
630-
*
631-
* @param context
632-
* @param md
633-
* @param processName
634-
* @param processParams
635-
* @return
636-
*/
637-
private Element processMetadata(ServiceContext context,
638-
Element md,
639-
String processName,
640-
Map<String, Object> processParams) {
641-
Path filePath = context.getBean(GeonetworkDataDirectory.class).getXsltConversion(processName);
642-
if (!Files.exists(filePath)) {
643-
log.debug(" processing instruction " + processName + ". Metadata not filtered.");
644-
} else {
645-
Element processedMetadata;
646-
try {
647-
processedMetadata = Xml.transform(md, filePath, processParams);
648-
log.debug(" metadata filtered.");
649-
md = processedMetadata;
650-
} catch (Exception e) {
651-
log.warning(" processing error " + processName + ": " + e.getMessage());
652-
}
653-
}
654-
return md;
655-
}
656-
657629
/**
658630
* Retrieves the list of metadata uuids that have the same dataset identifier.
659631
*
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
//=============================================================================
2+
//=== Copyright (C) 2001-2024 Food and Agriculture Organization of the
3+
//=== United Nations (FAO-UN), United Nations World Food Programme (WFP)
4+
//=== and United Nations Environment Programme (UNEP)
5+
//===
6+
//=== This program is free software; you can redistribute it and/or modify
7+
//=== it under the terms of the GNU General Public License as published by
8+
//=== the Free Software Foundation; either version 2 of the License, or (at
9+
//=== your option) any later version.
10+
//===
11+
//=== This program is distributed in the hope that it will be useful, but
12+
//=== WITHOUT ANY WARRANTY; without even the implied warranty of
13+
//=== MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14+
//=== General Public License for more details.
15+
//===
16+
//=== You should have received a copy of the GNU General Public License
17+
//=== along with this program; if not, write to the Free Software
18+
//=== Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
19+
//===
20+
//=== Contact: Jeroen Ticheler - FAO - Viale delle Terme di Caracalla 2,
21+
//=== Rome - Italy. email: geonetwork@osgeo.org
22+
//==============================================================================
23+
24+
package org.fao.geonet.kernel.harvest.harvester.database;
25+
26+
import org.fao.geonet.Logger;
27+
import org.fao.geonet.kernel.harvest.harvester.AbstractHarvester;
28+
import org.fao.geonet.kernel.harvest.harvester.HarvestError;
29+
import org.fao.geonet.kernel.harvest.harvester.HarvestResult;
30+
31+
import java.sql.SQLException;
32+
import java.util.Collections;
33+
import java.util.LinkedList;
34+
import java.util.List;
35+
36+
public class DatabaseHarvester extends AbstractHarvester<HarvestResult, DatabaseHarvesterParams> {
37+
private static final String TABLE_NAME_PATTERN = "([_a-zA-Z]+[_a-zA-Z0-9]*)";
38+
private static final String FIELD_NAME_PATTERN = "([_a-zA-Z]+[_a-zA-Z0-9]*)";
39+
40+
@Override
41+
protected DatabaseHarvesterParams createParams() {
42+
return new DatabaseHarvesterParams(dataMan);
43+
}
44+
45+
@Override
46+
protected void storeNodeExtra(DatabaseHarvesterParams params, String path, String siteId, String optionsId) throws SQLException {
47+
// Remove non-valid characters
48+
params.setTableName(params.getTableName().replaceAll("[^" + TABLE_NAME_PATTERN + "]", ""));
49+
params.setMetadataField(params.getMetadataField().replaceAll("[^" + FIELD_NAME_PATTERN + "]", ""));
50+
params.setFilterField(params.getFilterField().replaceAll("[^" + FIELD_NAME_PATTERN + "]", ""));
51+
52+
setParams(params);
53+
54+
harvesterSettingsManager.add("id:" + siteId, "icon", params.getIcon());
55+
harvesterSettingsManager.add("id:" + siteId, "server", params.getServer());
56+
harvesterSettingsManager.add("id:" + siteId, "port", params.getPort());
57+
harvesterSettingsManager.add("id:" + siteId, "username", params.getUsername());
58+
harvesterSettingsManager.add("id:" + siteId, "password", params.getPassword());
59+
harvesterSettingsManager.add("id:" + siteId, "database", params.getDatabase());
60+
harvesterSettingsManager.add("id:" + siteId, "databaseType", params.getDatabaseType());
61+
harvesterSettingsManager.add("id:" + siteId, "tableName", params.getTableName());
62+
harvesterSettingsManager.add("id:" + siteId, "metadataField", params.getMetadataField());
63+
harvesterSettingsManager.add("id:" + siteId, "xslfilter", params.getXslfilter());
64+
65+
String filtersID = harvesterSettingsManager.add(path, "filter", "");
66+
harvesterSettingsManager.add("id:" + filtersID, "field", params.getFilterField());
67+
harvesterSettingsManager.add("id:" + filtersID, "value", params.getFilterValue());
68+
harvesterSettingsManager.add("id:" + filtersID, "operator", params.getFilterOperator());
69+
}
70+
71+
@Override
72+
protected void doHarvest(Logger l) throws Exception {
73+
log.info("Database harvester start");
74+
DatabaseHarvesterAligner h = new DatabaseHarvesterAligner(cancelMonitor, log, context, params, errors);
75+
result = h.harvest(log);
76+
log.info("Database harvester end");
77+
}
78+
}

0 commit comments

Comments
 (0)