-
Notifications
You must be signed in to change notification settings - Fork 9
Discussion for merging Gleaner and Nabu
We want to merge the codebases since there are overlapping functionality.
- Gleaner: https://github.com/gleanerio/gleaner/tree/dev_ec
- nabu: https://github.com/gleanerio/nabu/tree/dev_eco
We would like to understand the overlapping functionality, address inconsistencies, create commonalities, and generate a codebase that shares the functionality. This can either be a branch in the existing repositories, or a new codebase. We will trust your recommendations.
We suggest you start by finding the possible duplicate or overlapping functionality. Then determine a possible command structure and functionality. then suggest any changes, and then implement the new code.
There are example JSON-LD files in a GitHub repository:
There are two example sitemaps generated from the JSON-LD examples
- https://earthcube.github.io/GeoCODES-Metadata/metadata/Dataset/json/sitemap.xml
- https://earthcube.github.io/GeoCODES-Metadata/metadata/Dataset/allgood/sitemap.xml
The existing configuration files being used are on a triple store with fixed urls:
- gleaner: https://oss.geocodes-aws.earthcube.org/decoder/scheduler/configs/production/gleanerconfig.yaml
- nabu: https://oss.geocodes-aws.earthcube.org/decoder/scheduler/configs/production/nabuconfig.yaml
an additional one exists to build communities. tenant: https://oss.geocodes-aws.earthcube.org/decoder/scheduler/configs/production/tenant.yaml
Overlapping functionality:
- conversation to quads
we should use two config files from the programs, and rework into two configuration files:
- services, things with urls and passwords that can be configured. Places with secrets.
- sources, implementation network information
There is a command line pattern in place: glcon that is used to run gleaner, and should be extended in the merged.
with two files, gleaner and nabu, this is already in practice glcon already uses a directory to manage the this will remove the need to 'generate' a config, though generate includes pulling from a csv file to create a sources list.
Form a common codebase, gleaner will need to import the config from nabu so all config and common stuff moves to nabu/pkg
Replace the gleaner rdf conversion with the nabu conversion basically remove the duplicate jsonld to RDF code.
We reorgianize and shorten the glcon command structure.
- config init (config_directory)-- initialize a config folder
- config update/generate (config_directory)-- update the sources
- summon (config_directory)-- run the summon gleaner process
- mill (config_directory) -- run the mill process, using the common graph conversion code shared between gleaner and nabu
- load summon|prov (--source source) (config_directory) -- was prefix
- bulkload summon|prov (--source source) (config_directory)
- release summon|prov|orgs (--source source) (config_directory) - create a release file
- object path_to_s3 (config_directory) -- upload one
- prune
- graph clear
- graph drop - drop a single named graph
The utilities are tools that are used to investigate jsonld summon and conversion workflows. Basically, manual testing tools
- tool s3clear - clear a bucket... can we just proxy the minioadmin commands? it's a go app.
- tool identifier - read jsonld, return identifier
- tool jsonld - read jsonld, output context fixed jsonld test json ld context changes
- tool rdf - convert jsonld to rdf
** OTHERS **