Skip to content

Commit 6a4f036

Browse files
authored
Merge pull request #49 from gleanerio/df-dev
@valentinedwv Thanks for all the great feedback and code review!
2 parents 749f4fe + 9d9bcf0 commit 6a4f036

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+85908
-513
lines changed

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.0.8-development
1+
2.0.18-df-development

config/example.yaml

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ minio:
55
accesskey: akey
66
secretkey: skey
77
bucket: gleaner
8+
region: null
9+
implementation_network:
10+
orgname: iow
811
context:
912
cache: true
1013
strict: true
@@ -19,11 +22,42 @@ objects:
1922
- summoned/providera
2023
- prov/providera
2124
- org
22-
sparql:
23-
endpoint: http://localhost/blazegraph/namespace/earthcube/sparql
24-
endpointBulk: http://coreos.lan:3030/testing/data
25-
endpointMethod: POST
26-
contentType: application/n-quads
27-
authenticate: false
28-
username: ""
29-
password: ""
25+
endpoints:
26+
- service: ec_blazegraph
27+
baseurl: http://coreos.lan:9090/blazegraph/namespace/iow
28+
type: blaszgraph
29+
authenticate: false
30+
username: admin
31+
password: jfpwd
32+
modes:
33+
- action: sparql
34+
suffix: /sparql
35+
accept: application/sparql-results+json
36+
method: GET
37+
- action: update
38+
suffix: /sparql
39+
accept: application/sparql-update
40+
method: POST
41+
- action: bulk
42+
suffix: /sparql
43+
accept: text/x-nquads
44+
method: POST
45+
- service: iow_graphdb
46+
baseurl: http://coreos.lan:7200/repositories/testing
47+
type: graphed
48+
authenticate: false
49+
username: admin
50+
password: jfpw
51+
modes:
52+
- action: sparql
53+
suffix: # no suffix needed for GraphDB
54+
accept: application/sparql-results+json
55+
method: GET
56+
- action: update
57+
suffix: /statements
58+
accept: application/sparql-update
59+
method: POST
60+
- action: bulk
61+
suffix: /statements
62+
accept: text/x-nquads
63+
method: POST

decisions/0001-URN-decision.md

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -8,45 +8,47 @@ Proposed
88

99
## Context
1010

11-
URNs for the graph URI are set in the file internal/graph/mintURN.go
12-
13-
current
14-
```
15-
urn:{bucket}:{provider}:{sha}
16-
```
17-
18-
proposed
19-
```
20-
urn:gleanerio:{network}:{provider}:{sha}
21-
```
22-
11+
As JSON-LD documents, representing data graphs, are collected from sources they
12+
need to be processed in the graph. When doing this we generate a named graph URN
13+
to identify the set of triples coming from a given document.
2314

2415
## Decision
2516

26-
Old URNs were varationas on
17+
****The desired URN pattern would then look like the following.
18+
These would likely always be pulled from the summoned prefix, and as such be JSON-LD.
2719

2820
```rdf
29-
urn:gleaner.io:summoned:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
21+
urn:{engine}:{implnet}:{source}:{type}:{sha}
3022
```
3123

32-
or
24+
* engine: In our case always _gleaner.io_ to represent the code base used. Other
25+
groups may wish to use other packages like apache systems and can denote that here.
26+
The value is small, but it does give some evidence to the tools used which my have impact.
27+
* implnet: The implementing network or organization doing the activity. This should be
28+
one word, lower case and all alphanumeric. So things like: oih, decoder, geocodes, iow, polder, etc.
29+
* source: The name of the source from the Gleaner configuration file. It should also be
30+
one word, lower case and all alphanumeric. So things like: bcodmo, aquadocs, iris, etc.
31+
* type: One of;
32+
* data: representing the data graphs collected
33+
* prov: representing the prov graphs describing the collection process
34+
* org: representing on the organization data graphs generated by Gleaner for a source
35+
* sha: The sha hash generated.
36+
37+
Populated examples might look like:
3338

3439
```rdf
35-
urn:gleaner.io:milled:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
40+
urn:gleaner.io:oih:edmo:prov:0255293683036aac2a95a2479cc841189c0ac3f8
41+
or
42+
urn:gleaner.io:iow:counties0:data:00010f9f071c39fcc0ca73eccad7470b675cd8a3
3643
```
3744

38-
The milled and summoned elements were pointless and led to confusion and were not
39-
really important in terms of getting to the object.
40-
41-
The new desired URN pattern is
42-
43-
```rdf
44-
urn:gleaner.io:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
45-
```
4645

4746
## Consequences
4847

4948
This impacts gleaner in the generation of prov which will need to use this same pattern
5049
to fill out the prov records.
5150

52-
51+
Also, this means the URN does not actually represent the location of the object. Rather the
52+
client must know to go looking in summoned and or milled. As noted, the use of milled is
53+
not really compelling. That aside, it is a case where the URN is now just an identifier and
54+
does not represent a resolvable object.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# 1. Record architecture decisions
2+
3+
Date: 08-23-2023
4+
5+
## Status
6+
7+
Proposed
8+
9+
## Context
10+
11+
There are some conventions used in the levering of an object store by GleanerIO.
12+
This ADR scopes the naming conventions used both by Gleaner and Nabu.
13+
14+
Some of these conventions have implications on the behavior of the code. For
15+
example, the URN generation leverages the path structure to establish the
16+
urn structure (see 0001-URN-decision.md).
17+
18+
Though the resulting URN is abstracted from the object prefix value, that prefix
19+
is still used in the initial formation.
20+
21+
* graphs/
22+
* graphs/archive
23+
* graphs/latest
24+
* graphs/summary
25+
* summoned/
26+
* prov/
27+
* milled/
28+
* orgs/
29+
* reports/
30+
* scheduler/
31+
32+
## Decision
33+
34+
35+
## Consequences
36+

docs/httpSPARQL.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,11 @@ curl -X POST -H 'Content-Type:application/n-quads' --data-binary @May4Buildings
1616
```bash
1717
curl -X POST -H 'Content-Type:text/x-nquads' --data-binary @May4Buildings.nq http://192.168.86.45:32772/blazegraph/namespace/loadtest/sparql
1818
```
19+
20+
```bash
21+
curl -H 'Accept: application/sparql-results+json' http://coreos.lan:9090/blazegraph/namespace/iow/sparql --data-urlencode 'query=select * where{ ?s ?p ?o } limit 10'
22+
```
23+
24+
```bash
25+
curl -H 'Accept: application/sparql-results+json' http://coreos.lan:9090/blazegraph/namespace/iow/sparql --data-urlencode 'query=SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)(COUNT(*) AS ?triplesCount)WHERE {GRAPH ?graph {?subject ?predicate ?object}}'
26+
```

docs/images/workflow.d2

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
direction: right
2+
3+
4+
gi: Get Image(s) {
5+
style.fill: "#e0a3ff"
6+
width: 200
7+
height: 150
8+
}
9+
10+
g: Gleaner Harvest {
11+
style.fill: honeydew
12+
width: 200
13+
height: 150
14+
}
15+
16+
data: Data Graph {
17+
dr: Build Release Graph {
18+
style.fill: "#f4a261"
19+
width: 300
20+
}
21+
udr: Load Release Graph {
22+
style.fill: "#f4a261"
23+
width: 300
24+
}
25+
26+
dp: Prune {
27+
style.fill: "#f4a261"
28+
width: 300
29+
}
30+
}
31+
32+
org: Organization Graph {
33+
or: Prefix Load Graph {
34+
style.fill: "#f4a261"
35+
width: 300
36+
}
37+
38+
# uor: Load Release Graph {
39+
# style.fill: "#f4a261"
40+
# width: 300
41+
# }
42+
43+
# op: Prune {
44+
# style.fill: "#f4a261"
45+
# width: 300
46+
# }
47+
48+
# org.or -> org.uor -> org.op
49+
50+
}
51+
52+
prov: Provenance Graph {
53+
pr: Build Release Graph {
54+
style.fill: "#f4a261"
55+
width: 300
56+
}
57+
dpg: Clear Current Graph{
58+
style.fill: "#f4a261"
59+
width: 300
60+
}
61+
upr: Load Release Graph {
62+
style.fill: "#f4a261"
63+
width: 300
64+
}
65+
dpp: Delete Generated Data Graphs {
66+
style.fill: "#f4a261"
67+
width: 300
68+
}
69+
}
70+
71+
gi -> g
72+
g -> org.or
73+
g -> data.dr
74+
g -> prov.pr
75+
76+
data.dr -> data.udr -> data.dp
77+
prov.pr -> prov.dpg -> prov.upr -> prov.dpp

docs/images/workflow.svg

Lines changed: 102 additions & 0 deletions
Loading

docs/sparql/countAllGraphs.rq

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
2-
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
3-
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
4-
5-
SELECT (COUNT( DISTINCT ?g) AS ?graphs)
61

2+
SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)
3+
(COUNT(*) AS ?triplesCount)
74
WHERE {
8-
graph ?g {
9-
?s ?p ?o
5+
GRAPH ?graph {
6+
?subject ?predicate ?object
107
}
118
}
9+
10+
11+
12+
SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)(COUNT(*) AS ?triplesCount)WHERE {GRAPH ?graph {?subject ?predicate ?object}}

go.mod

Lines changed: 4 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,47 +3,20 @@ module github.com/gleanerio/nabu
33
go 1.15
44

55
require (
6-
github.com/bbalet/stopwords v1.0.0
7-
github.com/blevesearch/bleve v1.0.14 // indirect
8-
github.com/buger/jsonparser v1.1.1 // indirect
9-
github.com/coreos/bbolt v1.3.2 // indirect
10-
github.com/coreos/etcd v3.3.13+incompatible // indirect
11-
github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e // indirect
12-
github.com/coreos/pkg v0.0.0-20180928190104-399ea9e2e55f // indirect
13-
github.com/coyove/jsonbuilder v0.0.0-20160414062945-90ee6d2c3c43 // indirect
14-
github.com/dgrijalva/jwt-go v3.2.0+incompatible // indirect
15-
github.com/gleanerio/gleaner v0.0.0-20211103190335-f9d8811ee43b // indirect
16-
github.com/go-ini/ini v1.62.0 // indirect
17-
github.com/gosuri/uilive v0.0.4 // indirect
18-
github.com/gosuri/uiprogress v0.0.1
19-
github.com/grpc-ecosystem/go-grpc-middleware v1.0.0 // indirect
20-
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
21-
github.com/jonboulle/clockwork v0.1.0 // indirect
22-
github.com/kisielk/godepgraph v0.0.0-20190626013829-57a7e4a651a9 // indirect
6+
github.com/google/uuid v1.2.0 // indirect
237
github.com/knakk/rdf v0.0.0-20190304171630-8521bf4c5042
248
github.com/meilisearch/meilisearch-go v0.21.1
25-
github.com/minio/minio-go v6.0.14+incompatible
269
github.com/minio/minio-go/v7 v7.0.15
27-
github.com/neuml/txtai.go v1.0.0
2810
github.com/orandin/lumberjackrus v1.0.1
29-
github.com/paulmach/go.geojson v1.4.0 // indirect
30-
github.com/piprate/json-gold v0.4.0
31-
github.com/prometheus/client_golang v0.9.3 // indirect
32-
github.com/protolambda/gocyto v0.0.1 // indirect
11+
github.com/piprate/json-gold v0.5.0
12+
github.com/pquerna/cachecontrol v0.1.0 // indirect
3313
github.com/rs/xid v1.2.1
34-
github.com/schollz/progressbar v1.0.0
3514
github.com/schollz/progressbar/v3 v3.8.3
3615
github.com/sirupsen/logrus v1.8.1
37-
github.com/soheilhy/cmux v0.1.4 // indirect
3816
github.com/spf13/cobra v1.2.1
3917
github.com/spf13/viper v1.9.0
4018
github.com/tidwall/gjson v1.14.2
4119
github.com/tidwall/sjson v1.2.5
42-
github.com/tmc/grpc-websocket-proxy v0.0.0-20190109142713-0ad062ec5ee5 // indirect
43-
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2 // indirect
44-
golang.org/x/text v0.3.7
20+
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
4521
gopkg.in/natefinch/lumberjack.v2 v2.0.0 // indirect
46-
gopkg.in/redis.v5 v5.2.9 // indirect
47-
gopkg.in/resty.v1 v1.12.0 // indirect
48-
honnef.co/go/tools v0.1.2 // indirect
4922
)

0 commit comments

Comments
 (0)