Skip to content

Discussion for merging Gleaner and Nabu

David Valentine edited this page Mar 12, 2026 · 5 revisions

We want to merge the codebases since there are overlapping functionality.

We would like to understand the overlapping functionality, address inconsistencies, create commonalities, and generate a codebase that shares the functionality. This can either be a branch in the existing repositories, or a new codebase. We will trust your recommendations.

We suggest you start by finding the possible duplicate or overlapping functionality. Then determine a possible command structure and functionality. then suggest any changes, and then implement the new code.

There are example JSON-LD files in a GitHub repository:

There are two example sitemaps generated from the JSON-LD examples

The existing configuration files being used are on a triple store with fixed urls:

an additional one exists to build communities. tenant: https://oss.geocodes-aws.earthcube.org/decoder/scheduler/configs/production/tenant.yaml

Overlapping functionality:

  • conversation to quads

config files

we should use two config files from the programs, and rework into two configuration files:

  • services, things with urls and passwords that can be configured. Places with secrets.
  • sources, implementation network information

There is a command line pattern in place: glcon that is used to run gleaner, and should be extended in the merged.

with two files, gleaner and nabu, this is already in practice glcon already uses a directory to manage the this will remove the need to 'generate' a config, though generate includes pulling from a csv file to create a sources list.

proposals

step 1:

Form a common codebase, gleaner will need to import the config from nabu so all config and common stuff moves to nabu/pkg

step 2:

Replace the gleaner rdf conversion with the nabu conversion basically remove the duplicate jsonld to RDF code.

step 2:

We reorgianize and shorten the glcon command structure.

config

  • config init (config_directory)-- initialize a config folder
  • config update/generate (config_directory)-- update the sources

summon/fetch

  • summon (config_directory)-- run the summon gleaner process
  • mill (config_directory) -- run the mill process, using the common graph conversion code shared between gleaner and nabu

graph

  • load summon|prov (--source source) (config_directory) -- was prefix
  • bulkload summon|prov (--source source) (config_directory)
  • release summon|prov|orgs (--source source) (config_directory) - create a release file
  • object path_to_s3 (config_directory) -- upload one
  • prune
  • graph clear
  • graph drop - drop a single named graph

utilities

The utilities are tools that are used to investigate jsonld summon and conversion workflows. Basically, manual testing tools

  • tool s3clear - clear a bucket... can we just proxy the minioadmin commands? it's a go app.
  • tool identifier - read jsonld, return identifier
  • tool jsonld - read jsonld, output context fixed jsonld test json ld context changes
  • tool rdf - convert jsonld to rdf

** OTHERS **

Clone this wiki locally