You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 18, 2023. It is now read-only.
Improve READMEs, fix dependencies for release (#95)
## What is the goal of this PR?
Improve install instructions in main README, and diagram explanations in KGCN README, ready for release.
## What are the changes implemented in this PR?
- README edits
- Fix the dependencies ready for release
- Lock the bazel and rbe install scripts to a specific build-tools commit
Copy file name to clipboardExpand all lines: README.md
+29-18Lines changed: 29 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,9 @@
8
8
9
9
[Grakn](https://github.com/graknlabs/grakn) lets us create Knowledge Graphs from our data. But what challenges do we encounter where querying alone won’t cut it? What library can address these challenges?
10
10
11
-
To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn knowledge graph.
11
+
To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn Knowledge Graph. More on this below, in [*Knowledge Graph Tasks*](https://github.com/graknlabs/kglib#knowledge-graph-tasks).
12
12
13
-
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn).
13
+
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn). Go there for more info on getting started with a working example.
14
14
15
15
## Quickstart
16
16
**Requirements**
@@ -21,23 +21,38 @@ At present this repo contains one project: [*Knowledge Graph Convolutional Netwo
21
21
22
22
- The [latest release of Grakn Core](https://github.com/graknlabs/grakn/releases/latest) or [Grakn KGMS](https://dev.grakn.ai/docs/cloud-deployment/kgms) running
23
23
24
+
**Run**
25
+
Take a look at [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) to see a walkthrough of how to use the library.
bazel test //... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
49
+
```
50
+
bazel test //kglib/... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
36
51
```
37
52
38
53
To build the pip distribution (find the output in `bazel-bin`):
39
54
40
-
```bash
55
+
```
41
56
bazel build //:assemble-pip
42
57
```
43
58
@@ -76,7 +91,7 @@ Here we term any task which creates new facts for the KG as *Knowledge Graph Com
76
91
77
92
#### Relation Prediction (a.k.a. Link prediction)
78
93
79
-
We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of binary Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.
94
+
We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of **binary** Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.
80
95
81
96
When predicting Relations, there are several scenarios we may have. When predicting binary Relations between the members of one set and the members of another set, we may need to predict them as:
82
97
@@ -88,21 +103,17 @@ When predicting Relations, there are several scenarios we may have. When predict
88
103
89
104
*Examples:* The problem of predicting which disease(s) a patient has is a one-to-many problem. Whereas, predicting which drugs in the KG treat which diseases is a many-to-many problem.
90
105
91
-
We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.
92
-
93
-
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can help us with one-to-one binary Relation prediction. This requires extra implementation, for which two approaches are apparent:
94
-
95
-
- Create two KGCNs, one for each of the two Roleplayers in the binary Relation. Extend the neural network to compare the embeddings of each Roleplayer, and classify the pairing according to whether a Relation should exist or not.
106
+
Notice also that recommender systems are one use case of one-to-many binary Relation prediction.
96
107
97
-
- Feed Relations directly to a KGCN, and classify their existence. (KGCNs can accept Relations as the Things of interest just as well as Entities). To do this we also need to create hypothetical Relations, labelled as negative examples, and feed them to the KGCN alongside the positively labelled known Relations. Note that this extends well to ternary and N-ary Relations.
108
+
We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.
98
109
99
-
Notice also that recommender systems are one use case of one-to-many binary Relation prediction.
110
+
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) performs Relation prediction using an approach based on [Graph Networks](https://github.com/deepmind/graph_nets) from DeepMind. This can be used to predict **binary**, **ternary**, or **N-ary** relations. This is well-supported for the one-to-one case and the one-to-many case.
100
111
101
112
#### Attribute Prediction
102
113
103
114
We would like to predict one or more Attributes of a Thing, which may include also prediction of whether that Attribute should even be present at all.
104
115
105
-
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. Attribute prediction is already fully supported.
116
+
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. This requires some minor additional functionality to be added (we intend to build this imminently).
106
117
107
118
#### Subgraph Prediction
108
119
@@ -114,7 +125,7 @@ Embeddings of Things and/or Types are universally useful for performing other do
114
125
These vectors are easy to ingest into other ML pipelines.
115
126
The benefit of building general-purpose embeddings is therefore to make use of them in multiple other pipelines. This reduces the expense of traversing the Knowledge Graph, since this task can be performed once and the output re-used more than once.
116
127
117
-
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).
128
+
***In KGLIB***[*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model in an unsupervised fashion. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).
118
129
119
130
#### Rule Mining (a.k.a. Association Rule Learning)
commit="f50e7a618045c99862bed78f813b1cfbb25a6016", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
7
+
commit="04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
commit="4f03fc79fba71f216a28a4bc412c084fcef099a0"# sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
22
+
tag="1.5.4"# sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
Copy file name to clipboardExpand all lines: kglib/kgcn/README.md
+100-3Lines changed: 100 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,5 @@
1
+
2
+
1
3
# Knowledge Graph Convolutional Networks
2
4
3
5
This project introduces a novel model: the *Knowledge Graph Convolutional Network* (KGCN). This project is in its second major iteration since its inception.
@@ -31,10 +33,105 @@ See the [full example](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn
31
33
Once you have installed kglib via pip (as above) you can run the example as follows:
32
34
33
35
1. Start a Grakn server
36
+
34
37
2. Load [the schema](kglib/utils/grakn/synthetic/examples/diagnosis/schema.gql) for the example into Grakn. The template for the command is `./grakn console -k diagnosis -f path/to/schema.gql`
38
+
35
39
3. Run the example: `python -m kglib.kgcn.examples.diagnosis.diagnosis`
40
+
36
41
4. You should observe console output to indicate that the pipeline is running and that the model is learning. Afterwards two plots should be created to visualise the training process and examples of the predictions made.
37
42
43
+
## Output
44
+
45
+
### Console
46
+
47
+
During training, the console will output metrics for the performance on the training and test sets.
48
+
49
+
You should see output such as this for the diagnosis example:
The element we are most interested in is `Sge`, the proportion of subgraphs where all elements of the subgraph were classified correctly. This therefore represents an entirely correctly predicted example.
83
+
84
+
### Diagrams
85
+
86
+
#### Training Metrics
87
+
Upon running the example you will also get plots from matplotlib saved to your working directory.
88
+
89
+
You will see plots of metrics for the training process (training iteration on the x-axis) for the training set (solid line), and test set (dotted line). From left to right:
90
+
91
+
- The absolute loss across all of the elements in the dataset
92
+
- The fraction of all graph elements predicted correctly across the dataset
93
+
- The fraction of completely solved examples (subgraphs extracted from Grakn)
94
+
95
+

96
+
97
+
#### Visualise Predictions
98
+
99
+
We also receive a plot of some of the predictions made on the test set.
100
+
101
+
**Blue box:** Ground Truth
102
+
103
+
- Preexisting (known) graph elements are shown in blue
104
+
105
+
- Relations and role edges that **should be predicted to exist** are shown in green
106
+
107
+
- Candidate relations and role edges that **should not be predicted to exist** are shown faintly in red
108
+
109
+
**Black boxes**: Model Predictions at certain message-passing steps
110
+
111
+
This uses the same colour scheme as above, but opacity indicates a probability given by the model.
112
+
113
+
The learner predicts three classes for each graph element. These are:
114
+
115
+
```
116
+
[
117
+
Element already existed in the graph (we wish to ignore these elements),
118
+
Element does not exist in the graph,
119
+
Element does exist in the graph
120
+
]
121
+
```
122
+
123
+
In this way we perform relation prediction by proposing negative candidate relations (Grakn's rules help us with this). Then we train the learner to classify these negative candidates as **does not exist** and the correct relations as **does exist**.
124
+
125
+
These boxes shows the score assigned to the class **does exist**.
126
+
127
+
Therefore, for good predictions we want to see no blue elements, and for the red elements to fade out as more messages are passed, the green elements becoming more certain.
128
+
129
+
130
+
131
+

132
+
133
+
This visualisation has some flaws, and will be improved in the future.
134
+
38
135
## Methodology
39
136
40
137
The methodology that this implementation uses for Relation prediction is as follows:
@@ -43,7 +140,7 @@ In the case of the diagnosis example, we aim to predict `diagnosis` Relations. W
43
140
44
141
We then teach the KGCN to distinguish between the positive and negative examples.
45
142
46
-
###Examples == Subgraphs
143
+
###Examples == Subgraphs
47
144
48
145
We do this by creating *examples*, where each example is a subgraph extracted from a Grakn knowledge Graph. These subgraphs contain positive and negative instances of the relation to be predicted.
49
146
@@ -74,11 +171,11 @@ A single subgraph is extracted from Grakn by making these queries and combining
74
171
75
172
We can visualise such a subgraph by running these two queries in Grakn Workbase:
76
173
77
-

174
+

78
175
79
176
You can get the relevant version of Workbase from the Assets of the [latest release](https://github.com/graknlabs/workbase/releases/latest).
80
177
81
-
###Learning
178
+
###Learning
82
179
83
180
A KGCN is a learned message-passing graph algorithm. Neural network components are learned, and are used to transform signals that are passed around the graph. This approach is convolutional due to the fact that the same transformation is applied to all edges and another is applied to all nodes. It may help your understanding to analogise this to convolution over images, where the same transformation is applied over all pixel neighbourhoods.
0 commit comments