Skip to content
This repository was archived by the owner on Nov 18, 2023. It is now read-only.

Commit f9cfae4

Browse files
authored
Improve READMEs, fix dependencies for release (#95)
## What is the goal of this PR? Improve install instructions in main README, and diagram explanations in KGCN README, ready for release. ## What are the changes implemented in this PR? - README edits - Fix the dependencies ready for release - Lock the bazel and rbe install scripts to a specific build-tools commit
1 parent a5d6298 commit f9cfae4

File tree

6 files changed

+133
-25
lines changed

6 files changed

+133
-25
lines changed

.circleci/config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ commands:
33

44
install-bazel-linux-rbe:
55
steps:
6-
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/master/ci/install-bazel-linux.sh
6+
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80/ci/install-bazel-linux.sh
77
- run: bash ./install-bazel-linux.sh && rm ./install-bazel-linux.sh
8-
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/master/ci/install-bazel-rbe.sh
8+
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80/ci/install-bazel-rbe.sh
99
- run: bash ./install-bazel-rbe.sh && rm ./install-bazel-rbe.sh
1010

1111
run-grakn-server:

README.md

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88

99
[Grakn](https://github.com/graknlabs/grakn) lets us create Knowledge Graphs from our data. But what challenges do we encounter where querying alone won’t cut it? What library can address these challenges?
1010

11-
To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn knowledge graph.
11+
To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn Knowledge Graph. More on this below, in [*Knowledge Graph Tasks*](https://github.com/graknlabs/kglib#knowledge-graph-tasks).
1212

13-
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn).
13+
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn). Go there for more info on getting started with a working example.
1414

1515
## Quickstart
1616
**Requirements**
@@ -21,23 +21,38 @@ At present this repo contains one project: [*Knowledge Graph Convolutional Netwo
2121

2222
- The [latest release of Grakn Core](https://github.com/graknlabs/grakn/releases/latest) or [Grakn KGMS](https://dev.grakn.ai/docs/cloud-deployment/kgms) running
2323

24+
**Run**
25+
Take a look at [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) to see a walkthrough of how to use the library.
26+
2427
**Building from source**
2528

26-
To test that all targets can be built:
29+
Clone KGLIB:
30+
31+
```
32+
git clone [email protected]:graknlabs/kglib.git
33+
```
34+
35+
`cd` in to the project:
36+
37+
```
38+
cd kglib
39+
```
40+
41+
To build all targets can be built:
2742

28-
```bash
43+
```
2944
bazel build //...
3045
```
3146

32-
To run all tests:
47+
To run all tests (requires Python 3.6+):
3348

34-
```bash
35-
bazel test //... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
49+
```
50+
bazel test //kglib/... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
3651
```
3752

3853
To build the pip distribution (find the output in `bazel-bin`):
3954

40-
```bash
55+
```
4156
bazel build //:assemble-pip
4257
```
4358

@@ -76,7 +91,7 @@ Here we term any task which creates new facts for the KG as *Knowledge Graph Com
7691

7792
#### Relation Prediction (a.k.a. Link prediction)
7893

79-
We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of binary Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.
94+
We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of **binary** Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.
8095

8196
When predicting Relations, there are several scenarios we may have. When predicting binary Relations between the members of one set and the members of another set, we may need to predict them as:
8297

@@ -88,21 +103,17 @@ When predicting Relations, there are several scenarios we may have. When predict
88103

89104
*Examples:* The problem of predicting which disease(s) a patient has is a one-to-many problem. Whereas, predicting which drugs in the KG treat which diseases is a many-to-many problem.
90105

91-
We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.
92-
93-
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can help us with one-to-one binary Relation prediction. This requires extra implementation, for which two approaches are apparent:
94-
95-
- Create two KGCNs, one for each of the two Roleplayers in the binary Relation. Extend the neural network to compare the embeddings of each Roleplayer, and classify the pairing according to whether a Relation should exist or not.
106+
Notice also that recommender systems are one use case of one-to-many binary Relation prediction.
96107

97-
- Feed Relations directly to a KGCN, and classify their existence. (KGCNs can accept Relations as the Things of interest just as well as Entities). To do this we also need to create hypothetical Relations, labelled as negative examples, and feed them to the KGCN alongside the positively labelled known Relations. Note that this extends well to ternary and N-ary Relations.
108+
We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.
98109

99-
Notice also that recommender systems are one use case of one-to-many binary Relation prediction.
110+
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) performs Relation prediction using an approach based on [Graph Networks](https://github.com/deepmind/graph_nets) from DeepMind. This can be used to predict **binary**, **ternary**, or **N-ary** relations. This is well-supported for the one-to-one case and the one-to-many case.
100111

101112
#### Attribute Prediction
102113

103114
We would like to predict one or more Attributes of a Thing, which may include also prediction of whether that Attribute should even be present at all.
104115

105-
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. Attribute prediction is already fully supported.
116+
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. This requires some minor additional functionality to be added (we intend to build this imminently).
106117

107118
#### Subgraph Prediction
108119

@@ -114,7 +125,7 @@ Embeddings of Things and/or Types are universally useful for performing other do
114125
These vectors are easy to ingest into other ML pipelines.
115126
The benefit of building general-purpose embeddings is therefore to make use of them in multiple other pipelines. This reduces the expense of traversing the Knowledge Graph, since this task can be performed once and the output re-used more than once.
116127

117-
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).
128+
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model in an unsupervised fashion. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).
118129

119130
#### Rule Mining (a.k.a. Association Rule Learning)
120131

dependencies/graknlabs/dependencies.bzl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ def graknlabs_build_tools():
44
git_repository(
55
name = "graknlabs_build_tools",
66
remote = "https://github.com/graknlabs/build-tools",
7-
commit = "f50e7a618045c99862bed78f813b1cfbb25a6016", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
7+
commit = "04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
88
)
99

1010

@@ -19,5 +19,5 @@ def graknlabs_client_python():
1919
git_repository(
2020
name = "graknlabs_client_python",
2121
remote = "https://github.com/graknlabs/client-python",
22-
commit = "4f03fc79fba71f216a28a4bc412c084fcef099a0" # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
22+
tag = "1.5.4" # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
2323
)
154 KB
Loading

kglib/kgcn/.images/learning.png

62.5 KB
Loading

kglib/kgcn/README.md

Lines changed: 100 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
2+
13
# Knowledge Graph Convolutional Networks
24

35
This project introduces a novel model: the *Knowledge Graph Convolutional Network* (KGCN). This project is in its second major iteration since its inception.
@@ -31,10 +33,105 @@ See the [full example](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn
3133
Once you have installed kglib via pip (as above) you can run the example as follows:
3234

3335
1. Start a Grakn server
36+
3437
2. Load [the schema](kglib/utils/grakn/synthetic/examples/diagnosis/schema.gql) for the example into Grakn. The template for the command is `./grakn console -k diagnosis -f path/to/schema.gql`
38+
3539
3. Run the example: `python -m kglib.kgcn.examples.diagnosis.diagnosis`
40+
3641
4. You should observe console output to indicate that the pipeline is running and that the model is learning. Afterwards two plots should be created to visualise the training process and examples of the predictions made.
3742

43+
## Output
44+
45+
### Console
46+
47+
During training, the console will output metrics for the performance on the training and test sets.
48+
49+
You should see output such as this for the diagnosis example:
50+
```
51+
# (iteration number), T (elapsed seconds), Ltr (training loss), Lge (test/generalization loss), Ctr (training fraction nodes/edges labeled correctly), Str (training fraction examples solved correctly), Cge (test/generalization fraction nodes/edges labeled correctly), Sge (test/generalization fraction examples solved correctly)
52+
# 00000, T 8.7, Ltr 2.4677, Lge 2.3044, Ctr 0.2749, Str 0.0000, Cge 0.2444, Sge 0.0000
53+
# 00050, T 11.3, Ltr 0.5098, Lge 0.4571, Ctr 0.8924, Str 0.0000, Cge 0.8983, Sge 0.0000
54+
# 00100, T 14.0, Ltr 0.3694, Lge 0.3340, Ctr 0.8924, Str 0.0000, Cge 0.8983, Sge 0.0000
55+
# 00150, T 16.6, Ltr 0.3309, Lge 0.3041, Ctr 0.9010, Str 0.0000, Cge 0.8919, Sge 0.0000
56+
# 00200, T 19.2, Ltr 0.3125, Lge 0.2940, Ctr 0.9010, Str 0.0000, Cge 0.8919, Sge 0.0000
57+
# 00250, T 21.8, Ltr 0.2975, Lge 0.2790, Ctr 0.9254, Str 0.2000, Cge 0.9178, Sge 0.4333
58+
# 00300, T 24.4, Ltr 0.2761, Lge 0.2641, Ctr 0.9332, Str 0.6000, Cge 0.9243, Sge 0.4333
59+
# 00350, T 27.0, Ltr 0.2653, Lge 0.2534, Ctr 0.9340, Str 0.6000, Cge 0.9243, Sge 0.4333
60+
# 00400, T 29.7, Ltr 0.2866, Lge 0.2709, Ctr 0.9332, Str 0.6000, Cge 0.9178, Sge 0.0000
61+
# 00450, T 32.3, Ltr 0.2641, Lge 0.2609, Ctr 0.9324, Str 0.6000, Cge 0.9178, Sge 0.4333
62+
# 00500, T 34.9, Ltr 0.2601, Lge 0.2544, Ctr 0.9324, Str 0.6000, Cge 0.9178, Sge 0.4333
63+
# 00550, T 37.5, Ltr 0.2571, Lge 0.2501, Ctr 0.9332, Str 0.6000, Cge 0.9243, Sge 0.4333
64+
# 00600, T 40.1, Ltr 0.2530, Lge 0.2404, Ctr 0.9348, Str 0.6000, Cge 0.9373, Sge 0.4333
65+
# 00650, T 42.7, Ltr 0.2508, Lge 0.2363, Ctr 0.9356, Str 0.6000, Cge 0.9438, Sge 0.4333
66+
# 00700, T 45.3, Ltr 0.2500, Lge 0.2340, Ctr 0.9372, Str 0.7333, Cge 0.9503, Sge 0.4333
67+
# 00750, T 48.0, Ltr 0.2493, Lge 0.2307, Ctr 0.9372, Str 0.7333, Cge 0.9567, Sge 0.8000
68+
# 00800, T 50.7, Ltr 0.2488, Lge 0.2284, Ctr 0.9372, Str 0.7333, Cge 0.9567, Sge 0.8000
69+
```
70+
71+
Take note of the key:
72+
73+
- \# - iteration number
74+
- T - elapsed seconds
75+
- Ltr - training loss
76+
- Lge - test/generalization loss
77+
- Ctr - training fraction nodes/edges labeled correctly
78+
- Str - training fraction examples solved correctly
79+
- Cge - test/generalization fraction nodes/edges labeled correctly
80+
- Sge - test/generalization fraction examples solved correctly
81+
82+
The element we are most interested in is `Sge`, the proportion of subgraphs where all elements of the subgraph were classified correctly. This therefore represents an entirely correctly predicted example.
83+
84+
### Diagrams
85+
86+
#### Training Metrics
87+
Upon running the example you will also get plots from matplotlib saved to your working directory.
88+
89+
You will see plots of metrics for the training process (training iteration on the x-axis) for the training set (solid line), and test set (dotted line). From left to right:
90+
91+
- The absolute loss across all of the elements in the dataset
92+
- The fraction of all graph elements predicted correctly across the dataset
93+
- The fraction of completely solved examples (subgraphs extracted from Grakn)
94+
95+
![learning metrics](.images/learning.png)
96+
97+
#### Visualise Predictions
98+
99+
We also receive a plot of some of the predictions made on the test set.
100+
101+
**Blue box:** Ground Truth
102+
103+
- Preexisting (known) graph elements are shown in blue
104+
105+
- Relations and role edges that **should be predicted to exist** are shown in green
106+
107+
- Candidate relations and role edges that **should not be predicted to exist** are shown faintly in red
108+
109+
**Black boxes**: Model Predictions at certain message-passing steps
110+
111+
This uses the same colour scheme as above, but opacity indicates a probability given by the model.
112+
113+
The learner predicts three classes for each graph element. These are:
114+
115+
```
116+
[
117+
Element already existed in the graph (we wish to ignore these elements),
118+
Element does not exist in the graph,
119+
Element does exist in the graph
120+
]
121+
```
122+
123+
In this way we perform relation prediction by proposing negative candidate relations (Grakn's rules help us with this). Then we train the learner to classify these negative candidates as **does not exist** and the correct relations as **does exist**.
124+
125+
These boxes shows the score assigned to the class **does exist**.
126+
127+
Therefore, for good predictions we want to see no blue elements, and for the red elements to fade out as more messages are passed, the green elements becoming more certain.
128+
129+
130+
131+
![predictions made on test set](.images/graph_snippet.png)
132+
133+
This visualisation has some flaws, and will be improved in the future.
134+
38135
## Methodology
39136

40137
The methodology that this implementation uses for Relation prediction is as follows:
@@ -43,7 +140,7 @@ In the case of the diagnosis example, we aim to predict `diagnosis` Relations. W
43140

44141
We then teach the KGCN to distinguish between the positive and negative examples.
45142

46-
###Examples == Subgraphs
143+
### Examples == Subgraphs
47144

48145
We do this by creating *examples*, where each example is a subgraph extracted from a Grakn knowledge Graph. These subgraphs contain positive and negative instances of the relation to be predicted.
49146

@@ -74,11 +171,11 @@ A single subgraph is extracted from Grakn by making these queries and combining
74171

75172
We can visualise such a subgraph by running these two queries in Grakn Workbase:
76173

77-
![](.images/queried_subgraph.png)
174+
![queried subgraph](.images/queried_subgraph.png)
78175

79176
You can get the relevant version of Workbase from the Assets of the [latest release](https://github.com/graknlabs/workbase/releases/latest).
80177

81-
###Learning
178+
### Learning
82179

83180
A KGCN is a learned message-passing graph algorithm. Neural network components are learned, and are used to transform signals that are passed around the graph. This approach is convolutional due to the fact that the same transformation is applied to all edges and another is applied to all nodes. It may help your understanding to analogise this to convolution over images, where the same transformation is applied over all pixel neighbourhoods.
84181

0 commit comments

Comments
 (0)