Skip to content
This repository was archived by the owner on Nov 18, 2023. It is now read-only.

Commit f171590

Browse files
authored
Pre release upgrade (#33)
* Upgrading to use Grakn commit 760c46dc1f19d572c2abaa61fc5a16bf4ced4312 * Upgrades kglib to use Grakn commit 760c46dc1f19d572c2abaa61fc5a16bf4ced4312, mostly requiring syntax changes and the temporary lack of limits to query result length * Minor changes to printing confusion matrices and properly passes attribute label values * Improves READMEs * Fix for label_extraction_test
1 parent 9699652 commit f171590

File tree

11 files changed

+53
-50
lines changed

11 files changed

+53
-50
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Research
22
This repository is the centre of all research projects conducted at Grakn Labs. In particular, it's focus is on the integration of machine learning with the Grakn knowledge graph.
33

4-
Our first project is on [*Knowledge Graph Convolutional Networks* (KGCNs)](/kglib/kgcn).
4+
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](/kglib/kgcn).
5+

kglib/kgcn/README.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,48 @@
11
# Knowledge Graph Convolutional Networks (KGCNs)
22

3-
This project introduces a novel model: the Knowledge Graph Convolutional Network. The principal idea of this work is to build a bridge between knowledge graphs and machine learning. KGCNs can be used to create vector representations or *embeddings* of any labelled set of Grakn concepts. As a result, a KGCN can be trained directly for the classification or regression of Concepts stored in Grakn. Future work will include building embeddings via unsupervised learning.![KGCN Process](readme_images/KGCN_process.png)
3+
This project introduces a novel model: the *Knowledge Graph Convolutional Network* (KGCN). The principal idea of this work is to forge a bridge between knowledge graphs and machine learning, using [Grakn](https://github.com/graknlabs/grakn) as the knowledge graph. A KGCN can be used to create vector representations, *embeddings*, of any labelled set of Grakn Concepts via supervised learning. As a result, a KGCN can be trained directly for the classification or regression of Concepts stored in Grakn. Future work will include building embeddings via unsupervised learning.![KGCN Process](readme_images/KGCN_process.png)
44

55

66

77
## Methodology
88

9-
The ideology behind this project is described [here](https://blog.grakn.ai/knowledge-graph-convolutional-networks-machine-learning-over-reasoned-knowledge-9eb5ce5e0f68). The principles of the implementation are based on [GraphSAGE](http://snap.stanford.edu/graphsage/), from the Stanford SNAP group, made to work over a **knowledge graph**. Instead of working on a typical property graph, a KGCN learns from the context of a *typed hypergraph*, Grakn. Additionally, it learns from facts deduced by Grakn's *automated logical reasoner*. From this point on some understanding of [Grakn's docs](http://dev.grakn.ai) is assumed.
9+
The ideology behind this project is described [here](https://blog.grakn.ai/knowledge-graph-convolutional-networks-machine-learning-over-reasoned-knowledge-9eb5ce5e0f68). The principles of the implementation are based on [GraphSAGE](http://snap.stanford.edu/graphsage/), from the Stanford SNAP group, made to work over a **knowledge graph**. Instead of working on a typical property graph, a KGCN learns from the context of a *typed hypergraph*, **Grakn**. Additionally, it learns from facts deduced by Grakn's *automated logical reasoner*. From this point onwards some understanding of [Grakn's docs](http://dev.grakn.ai) is assumed.
1010

11-
#### How does a KGCN work?
11+
#### How do KGCNs work?
1212

13-
The purpose of this method is to derive embeddings for a set of Concepts (and thereby directly learn to classify them). We start by querying Grakn to find a set of examples with labels. Following that, we gather data about the neighbourhood of each example Concept. We do this by considering their *k-hop* neighbours.
13+
The purpose of this method is to derive embeddings for a set of Concepts (and thereby directly learn to classify them). We start by querying Grakn to find a set of labelled examples. Following that, we gather data about the neighbourhood of each example Concept. We do this by considering their *k-hop* neighbours.
1414

15-
![Screenshot 2019-01-24 at 19.00.31](readme_images/k-hop_neighbours.png)We retrieve the data concerning this neighbourhood from Grakn. This includes information on the *types*, *roles*, and *attribute* values of each neighbour encountered.
15+
![k-hop neighbours](readme_images/k-hop_neighbours.png)We retrieve the data concerning this neighbourhood from Grakn. This information includes the *type hierarchy*, *roles*, and *attribute* values of each neighbouring Concept encountered.
1616

17-
To create embeddings, we build a network in TensorFlow that successively aggregates and combines features from the K hops until a 'summary' representation remains - an embedding. In our example these embeddings are directly optimised to perform multi-class classification via a single subsequent dense layer and softmax cross entropy.
17+
To create embeddings, we build a network in TensorFlow that successively aggregates and combines features from the K hops until a 'summary' representation remains - an embedding. In our example these embeddings are directly optimised to perform multi-class classification. This is achieved by passing the embeddings to a single subsequent dense layer and determining loss via softmax cross entropy with the labels retrieved.
1818

19-
![Screenshot 2019-01-24 at 19.03.08](readme_images/aggregate_and_combine.png)
19+
![Aggregation and Combination process](readme_images/aggregate_and_combine.png)
2020

2121

2222

23-
## Example - CITES Animal Trade Data
23+
## Usage by example - CITES Animal Trade Data
2424

25-
#### Quickstart
25+
### Quickstart
2626

2727
**Requirements:**
2828

2929
- Python 3.6.3 or higher
30-
3130
- kglib installed from pip: `pip install --extra-index-url https://test.pypi.org/simple/ grakn-kglib`
32-
- The `animaltrade` dataset from the latest release. This is a dataset that has been pre-loaded into Grakn v1.5 (so you don't have to run the data import yourself), with two keyspaces: `animaltrade_train` and `animaltrade_test`.
31+
- The `grakn-animaltrade.zip` dataset from the [latest release](https://github.com/graknlabs/kglib/releases/latest). This is a dataset that has been pre-loaded into Grakn v1.5 (so you don't have to run the data import yourself), with two keyspaces: `animaltrade_train` and `animaltrade_test`.
3332

3433
**To use:**
3534

3635
- Prepare the data:
3736

3837
- If you already have an insatnce of Grakn running, make sure to stop it using `./grakn server stop`
38+
39+
- Download the pre-loaded Grakn distribution from the [latest release](https://github.com/graknlabs/kglib/releases/latest)
3940

40-
- Unzip the pre-loaded Grakn + dataset from the latest release, the location you store this in doesn't matter
41+
- Unzip the distribution `unzip grakn-animaltrade.zip `, where you store this doesn't matter
4142

42-
- `cd` into the dataset and start Grakn: `./grakn server start`
43+
- cd into the distribution `cd grakn-animaltrade`
44+
45+
- start Grakn `./grakn server start`
4346

4447
- Confirm that the training keyspace is present and contains data
4548

@@ -61,12 +64,12 @@ To create embeddings, we build a network in TensorFlow that successively aggrega
6164

6265
The CITES dataset details exchanges of animal-based products between countries. In this example we aim to predict the value of `appendix` for a set of samples. This `appendix` can be thought of as the level of endangerment that a `traded-item` is subject to, where `1` represents the highest level of endangerment, and `3` the lowest.
6366

64-
The `main` function will:
67+
The [main](examples/animal_trade/main.py) function will:
6568

66-
- Search Grakn for 30 concepts (with an labels) to use as the training set, 30 for the evaluation set, and 30 for the prediction set using queries such as:
69+
- Search Grakn for 30 concepts (with attributes as labels) to use as the training set, 30 for the evaluation set, and 30 for the prediction set using queries such as (limiting the returned stream):
6770

6871
```
69-
match $e(exchanged-item: $traded-item) isa exchange, has appendix $appendix; $appendix 1; limit 30; get;
72+
match $e(exchanged-item: $traded-item) isa exchange, has appendix $appendix; $appendix 1; get;
7073
```
7174

7275
This searches for an `exchange` between countries that has an `appendix` (endangerment level) of `1`, and finds the `traded-item` that was exchanged

kglib/kgcn/examples/animal_trade/main.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,12 @@
4848
flags.DEFINE_integer('max_training_steps', 10000, 'Max number of gradient steps to take during gradient descent')
4949

5050
# Sample selection params
51-
EXAMPLES_QUERY = 'match $e(exchanged-item: $traded-item) isa exchange, has appendix $appendix; $appendix {}; ' \
52-
'limit {}; get;'
51+
EXAMPLES_QUERY = 'match $e(exchanged-item: $traded-item) isa exchange, has appendix $appendix; $appendix {}; get;'
5352
LABEL_ATTRIBUTE_TYPE = 'appendix'
53+
ATTRIBUTE_VALUES = [1, 2, 3]
5454
EXAMPLE_CONCEPT_TYPE = 'traded-item'
5555

56-
NUM_PER_CLASS = 30
56+
NUM_PER_CLASS = 10
5757
POPULATION_SIZE_PER_CLASS = 1000
5858

5959
# Params for persisting to files
@@ -125,8 +125,9 @@ def main(modes=(TRAIN, EVAL, PREDICT)):
125125
PREDICT: {'sample_size': NUM_PER_CLASS, 'population_size': POPULATION_SIZE_PER_CLASS},
126126
}
127127
concepts, labels = samp_mgmt.compile_labelled_concepts(EXAMPLES_QUERY, EXAMPLE_CONCEPT_TYPE,
128-
LABEL_ATTRIBUTE_TYPE, transactions[TRAIN],
129-
transactions[PREDICT], sampling_params)
128+
LABEL_ATTRIBUTE_TYPE, ATTRIBUTE_VALUES,
129+
transactions[TRAIN], transactions[PREDICT],
130+
sampling_params)
130131
prs.save_labelled_concepts(KEYSPACES, concepts, labels, SAVED_LABELS_PATH)
131132

132133
samp_mgmt.delete_all_labels_from_keyspaces(transactions, LABEL_ATTRIBUTE_TYPE)

kglib/kgcn/examples/animal_trade/schema.gql

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,11 +139,11 @@ define
139139
relates member-item,
140140
relates taxonomic-group;
141141

142-
taxonomic-ranking sub rule,
142+
taxonomic-ranking
143143
when {
144144
(super-taxon: $a, sub-taxon: $b) isa taxonomic-hierarchy;
145145
(super-taxon: $b, sub-taxon: $c) isa taxonomic-hierarchy;
146-
}
146+
},
147147
then {
148148
(super-taxon: $a, sub-taxon: $c) isa taxonomic-hierarchy;
149149
};
@@ -152,7 +152,7 @@ define
152152
when {
153153
(member-item: $a, taxonomic-group: $taxon) isa taxon-membership;
154154
(sub-taxon: $taxon, super-taxon: $super) isa taxonomic-hierarchy;
155-
}
155+
},
156156
then {
157157
(member-item: $a, taxonomic-group: $super) isa taxon-membership;
158158
};

kglib/kgcn/management/samples.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,13 @@ def query_for_random_samples_with_attribute(tx, query, example_var_name, attribu
3131
labels = {}
3232

3333
for a in attribute_vals:
34-
target_concept_query = query.format(a, population_size)
34+
target_concept_query = query.format(a)
3535

3636
extractor = label_extraction.ConceptLabelExtractor(target_concept_query,
3737
(example_var_name, collections.OrderedDict(
3838
[(attribute_var_name, attribute_vals)])),
39-
sampling_method=random.random_sample
40-
)
41-
concepts_with_labels = extractor(tx, sample_size_per_label)
39+
sampling_method=random.random_sample)
40+
concepts_with_labels = extractor(tx, sample_size_per_label, population_size)
4241
if len(concepts_with_labels) == 0:
4342
raise RuntimeError(f'Couldn\'t find any concepts to match target query "{target_concept_query}"')
4443

@@ -51,12 +50,13 @@ def query_for_random_samples_with_attribute(tx, query, example_var_name, attribu
5150
return concepts, labels
5251

5352

54-
def compile_labelled_concepts(samples_query, concept_var_name, attribute_var_name, train_and_eval_transaction,
55-
predict_transaction, sampling_params):
53+
def compile_labelled_concepts(samples_query, concept_var_name, attribute_var_name, attribute_values,
54+
train_and_eval_transaction, predict_transaction, sampling_params):
5655
"""
5756
Assumes the case that data is partitioned into 2 keyspaces, one for training and evaluation, and another for
5857
prediction on unseen data (with labels). Therefore this function draws training and evaluation samples from the
5958
same keyspace.
59+
:param attribute_values:
6060
:param samples_query: Query to use to select possible samples
6161
:param concept_var_name: The variable used for the example concepts within the `samples_query`
6262
:param attribute_var_name: The variable used for the samples' labels (attributes) within the `samples_query`
@@ -71,7 +71,7 @@ def compile_labelled_concepts(samples_query, concept_var_name, attribute_var_nam
7171
print(' for training and evaluation')
7272
concepts_dicts, labels_dicts = \
7373
query_for_random_samples_with_attribute(train_and_eval_transaction, samples_query,
74-
concept_var_name, attribute_var_name, [1, 2, 3],
74+
concept_var_name, attribute_var_name, attribute_values,
7575
sampling_params['train']['sample_size'] +
7676
sampling_params['eval']['sample_size'],
7777
sampling_params['train']['population_size'] +
@@ -81,7 +81,7 @@ def compile_labelled_concepts(samples_query, concept_var_name, attribute_var_nam
8181
query_for_random_samples_with_attribute(predict_transaction,
8282
samples_query,
8383
concept_var_name,
84-
attribute_var_name, [1, 2, 3],
84+
attribute_var_name, attribute_values,
8585
sampling_params['predict']['sample_size'],
8686
sampling_params['predict']['population_size'])
8787

kglib/kgcn/models/downstream.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,8 +142,6 @@ def train(self, feed_dict):
142142
print(f'\n-----')
143143
print(f'Step {step}')
144144
print(f'Loss: {loss_value:.2f}')
145-
print(f'Confusion Matrix:')
146-
print(confusion_matrix)
147145
metrics.report_multiclass_metrics(labels_winners_values, predictions_class_winners_values)
148146
print("========= Training Complete =========\n\n")
149147

@@ -157,8 +155,6 @@ def eval(self, feed_dict):
157155
self._labels_winners])
158156

159157
print(f'Loss: {loss_value:.2f}')
160-
print(f'Confusion Matrix:')
161-
print(confusion_matrix)
162158
metrics.report_multiclass_metrics(labels_winners_values, predictions_class_winners_values)
163159
print("========= Evaluation Complete =========\n\n")
164160

kglib/kgcn/neighbourhood/data/executor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class TraversalExecutor:
3737
}
3838

3939
ATTRIBUTE_QUERY = {
40-
'query': 'match $thing id {} has attribute $attribute; get $attribute;',
40+
'query': 'match $thing id {}, has attribute $attribute; get $attribute;',
4141
'variable': 'attribute'
4242
}
4343

kglib/kgcn/neighbourhood/data/executor_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ def setUp(self):
120120

121121
class TestTraversalExecutorFromDateAttribute(BaseTestTraversalExecutor.TestTraversalExecutor):
122122

123-
query = "match $attribute isa date-started 2015-11-12T00:00; limit 1; get;"
123+
query = "match $attribute isa date-started; $attribute 2015-11-12T00:00; get;"
124124
var = 'attribute'
125125
roles = ['has']
126126
num_results = 1

kglib/kgcn/test_data/schema.gql

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,19 @@
1919

2020
define
2121

22-
name sub attribute datatype string;
22+
name sub attribute,
23+
datatype string;
2324
job-title sub name;
24-
date-started sub attribute datatype date;
25+
date-started sub attribute,
26+
datatype date;
2527

2628
ownership sub relationship,
2729
relates owner,
2830
relates property;
2931

3032
organisation sub entity,
3133
plays member,
32-
plays group,
34+
plays organisational-group,
3335
plays property,
3436
plays owner,
3537
plays party,
@@ -51,11 +53,11 @@ affiliation sub relationship,
5153

5254
membership sub affiliation,
5355
relates member as party,
54-
relates group as party;
56+
relates organisational-group as party;
5557

5658
employment sub affiliation,
5759
relates employee as member,
58-
relates employer as group,
60+
relates employer as organisational-group,
5961
has job-title;
6062

6163
project sub entity,

kglib/kgcn/use_cases/attribute_prediction/label_extraction.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# specific language governing permissions and limitations
1717
# under the License.
1818
#
19-
19+
import itertools
2020
import typing as typ
2121
import kglib.kgcn.neighbourhood.data.sampling.random_sampling as random
2222

@@ -29,9 +29,9 @@ def __init__(self, query: str, attribute_vars_config: typ.Tuple[str, typ.Mutable
2929
self._attribute_vars_config = attribute_vars_config
3030
self._query = query
3131

32-
def __call__(self, tx, sample_size):
32+
def __call__(self, tx, sample_size, population_size):
3333

34-
response = tx.query(self._query)
34+
response = itertools.islice(tx.query(self._query), population_size)
3535
sampled_responses = self._sampling_method(response, sample_size)
3636
owner_var = self._attribute_vars_config[0]
3737

0 commit comments

Comments
 (0)