Can't reproduce subgraph coverage rate

Hello, thanks for your amazing work.
 
I wonder how you measured the subgraph coverage rate, i.e. subfigures `(a)` and `(b)` in _Figure 4_. In my understanding, it should be the retrieved subgraph without end2end training, i.e. the evaluation result with the weakly-supervised-trained model. Am I correct?

In the [data you uploaded](https://drive.google.com/drive/folders/1qNauEQJHuMs4uPQcCtMb-M9Seco5mTUl), for the webqsp dataset, `tmp/reader_data/webqsp/test_simple.json`  (or train / dev) is the retrieved subgraph, since it is the output of `retrieve_subgraph.py`. However, when I use a simple script below to examine the coverage rate, it is only merely 20%, far below the 90% in the paper, did I miss anything?

```python
test_retrieval_path = 'tmp/reader_data/webqsp/test_simple.json'
test_retrieval = srsly.read_jsonl(test_retrieval_path)
hit = 0
not_hit = 0
for sample in test_retrieval:
    answers = [ans['kb_id'] for ans in sample['answers']]
    entities = sample['subgraph']['entities']
    if any([entity in answers for entity in entities]):
        hit += 1
    else:
        not_hit += 1
print(f'{hit} / {hit + not_hit} = {hit / (hit + not_hit)}')
```
Output:
```
344 / 1639 = 0.20988407565588774
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't reproduce subgraph coverage rate #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't reproduce subgraph coverage rate #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions