diff --git a/README.md b/README.md
index cf63e71..2cd24c2 100644
--- a/README.md
+++ b/README.md
@@ -172,3 +172,69 @@ for metric in tqdm(metrics):
generated,
)
```
+## Example Benchmark
+
+The following results mirror the tables from our paper. Bold indicates best, and underlined indicates second-best. Values are multiplied by 100 for legibility. Standard deviations are obtained with subsampling using `StandardPGDInterval` and `MoleculePGDInterval`. Specific parameters are discussed in the paper.
+
+
+
+
+ | Method |
+ Planar-L |
+ Lobster-L |
+ SBM-L |
+ Proteins |
+ Guacamol |
+ Moses |
+
+
+
+
+ | AutoGraph |
+ 34.0 ± 1.8 |
+ 18.0 ± 1.6 |
+ 5.6 ± 1.5 |
+ 67.7 ± 7.4 |
+ 22.9 ± 0.5 |
+ 29.6 ± 0.4 |
+
+
+ | AutoGraph* |
+ — |
+ — |
+ — |
+ — |
+ 10.4 ± 1.2 |
+ — |
+
+
+ | DiGress |
+ 45.2 ± 1.8 |
+ 3.2 ± 2.6 |
+ 17.4 ± 2.3 |
+ 88.1 ± 3.1 |
+ 32.7 ± 0.5 |
+ 33.4 ± 0.5 |
+
+
+ | GRAN |
+ 99.7 ± 0.2 |
+ 85.4 ± 0.5 |
+ 69.1 ± 1.4 |
+ 89.7 ± 2.7 |
+ — |
+ — |
+
+
+ | ESGG |
+ 45.0 ± 1.4 |
+ 69.9 ± 0.6 |
+ 99.4 ± 0.2 |
+ 79.2 ± 4.3 |
+ — |
+ — |
+
+
+
+
+* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper.
diff --git a/docs/index.md b/docs/index.md
index f06f837..90edac2 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -92,201 +92,67 @@ PGD and its motivation are described in more detail in the paper and API docs.
### Benchmarking snapshot
-The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the implemented metrics (VUN, PGD, and PGD subscores). For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below.
-
-
+The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the newly proposed PolyGraph Discrepancy. For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. Values are scaled by 100 for legibility and subsampling is used to obtain standard deviations (using `StandardPGDInterval` and `MoleculePGDInterval`). More details are provided in our paper.
-
-
- | Dataset |
- Model |
- VUN (↑) |
- PGD (↓) |
- PGD subscores |
-
-
- | Clust. (↓) |
- Deg. (↓) |
- GIN (↓) |
- Orb5. (↓) |
- Orb4. (↓) |
- Eig. (↓) |
-
-
-
-
-
-| Planar-L |
- AutoGraph |
- 85.1 |
- 34.0 ± 1.8 |
- 7.0 ± 2.9 |
- 7.8 ± 3.2 |
- 8.8 ± 3.0 |
- 34.0 ± 1.8 |
- 28.5 ± 1.5 |
- 26.9 ± 2.3 |
-| DiGress |
- 80.1 |
- 45.2 ± 1.8 |
- 24.8 ± 2.0 |
- 23.3 ± 1.2 |
- 29.0 ± 1.1 |
- 45.2 ± 1.8 |
- 40.3 ± 1.8 |
- 39.4 ± 2.0 |
-| GRAN |
- 1.6 |
- 99.7 ± 0.2 |
- 99.3 ± 0.2 |
- 98.3 ± 0.3 |
- 98.3 ± 0.3 |
- 99.7 ± 0.1 |
- 99.2 ± 0.2 |
- 98.5 ± 0.4 |
-| ESGG |
- 93.9 |
- 45.0 ± 1.4 |
- 10.9 ± 3.2 |
- 21.7 ± 3.0 |
- 32.9 ± 2.2 |
- 45.0 ± 1.4 |
- 42.8 ± 1.9 |
- 29.6 ± 1.6 |
-
-
-| Lobster-L |
- AutoGraph |
- 83.1 |
- 18.0 ± 1.6 |
- 4.2 ± 1.9 |
- 12.1 ± 1.6 |
- 14.8 ± 1.5 |
- 18.0 ± 1.6 |
- 16.1 ± 1.6 |
- 13.0 ± 1.1 |
-| DiGress |
- 91.4 |
- 3.2 ± 2.6 |
- 2.0 ± 1.3 |
- 1.2 ± 1.5 |
- 2.3 ± 2.0 |
- 3.0 ± 3.1 |
- 4.5 ± 2.3 |
- 1.3 ± 1.1 |
-| GRAN |
- 41.3 |
- 85.4 ± 0.5 |
- 20.8 ± 1.1 |
- 77.1 ± 1.2 |
- 79.8 ± 0.6 |
- 85.4 ± 0.5 |
- 85.0 ± 0.6 |
- 69.8 ± 1.2 |
-| ESGG |
- 70.9 |
- 69.9 ± 0.6 |
- 0.0 ± 0.0 |
- 63.4 ± 1.1 |
- 66.8 ± 1.0 |
- 69.9 ± 0.6 |
- 66.0 ± 0.6 |
- 51.7 ± 1.8 |
-
-
-| SBM-L |
- AutoGraph |
- 85.6 |
- 5.6 ± 1.5 |
- 0.3 ± 0.6 |
- 6.2 ± 1.4 |
- 6.3 ± 1.3 |
- 3.2 ± 2.2 |
- 4.4 ± 2.0 |
- 2.5 ± 2.2 |
-| DiGress |
- 73.0 |
- 17.4 ± 2.3 |
- 5.7 ± 2.8 |
- 8.2 ± 3.3 |
- 13.8 ± 1.7 |
- 17.4 ± 2.3 |
- 14.8 ± 2.5 |
- 8.7 ± 3.0 |
-| GRAN |
- 21.4 |
- 69.1 ± 1.4 |
- 50.2 ± 1.9 |
- 58.6 ± 1.4 |
- 69.1 ± 1.4 |
- 65.7 ± 1.3 |
- 62.8 ± 1.3 |
- 55.9 ± 1.5 |
-| ESGG |
- 10.4 |
- 99.4 ± 0.2 |
- 97.9 ± 0.5 |
- 97.5 ± 0.6 |
- 98.3 ± 0.4 |
- 96.8 ± 0.4 |
- 89.2 ± 0.7 |
- 99.4 ± 0.2 |
-
-
-| Proteins |
- AutoGraph |
- – |
- 67.7 ± 7.4 |
- 47.7 ± 5.7 |
- 31.5 ± 8.5 |
- 45.3 ± 5.1 |
- 67.7 ± 7.4 |
- 47.4 ± 7.0 |
- 53.2 ± 6.9 |
-| DiGress |
- – |
- 88.1 ± 3.1 |
- 36.1 ± 4.3 |
- 29.2 ± 5.0 |
- 23.2 ± 5.3 |
- 88.1 ± 3.1 |
- 60.8 ± 3.6 |
- 23.4 ± 11.8 |
-| GRAN |
- – |
- 89.7 ± 2.7 |
- 86.0 ± 2.0 |
- 70.6 ± 3.1 |
- 71.5 ± 3.0 |
- 90.4 ± 2.4 |
- 84.4 ± 3.3 |
- 76.7 ± 4.7 |
-| ESGG |
- – |
- 79.2 ± 4.3 |
- 58.2 ± 3.6 |
- 54.0 ± 3.6 |
- 57.4 ± 4.1 |
- 80.2 ± 3.1 |
- 72.5 ± 3.0 |
- 24.3 ± 11.0 |
-
-
-
+
+
+ | Method |
+ Planar-L |
+ Lobster-L |
+ SBM-L |
+ Proteins |
+ Guacamol |
+ Moses |
+
+
+
+
+ | AutoGraph |
+ 34.0 ± 1.8 |
+ 18.0 ± 1.6 |
+ 5.6 ± 1.5 |
+ 67.7 ± 7.4 |
+ 22.9 ± 0.5 |
+ 29.6 ± 0.4 |
+
+
+ | AutoGraph* |
+ — |
+ — |
+ — |
+ — |
+ 10.4 ± 1.2 |
+ — |
+
+
+ | DiGress |
+ 45.2 ± 1.8 |
+ 3.2 ± 2.6 |
+ 17.4 ± 2.3 |
+ 88.1 ± 3.1 |
+ 32.7 ± 0.5 |
+ 33.4 ± 0.5 |
+
+
+ | GRAN |
+ 99.7 ± 0.2 |
+ 85.4 ± 0.5 |
+ 69.1 ± 1.4 |
+ 89.7 ± 2.7 |
+ — |
+ — |
+
+
+ | ESGG |
+ 45.0 ± 1.4 |
+ 69.9 ± 0.6 |
+ 99.4 ± 0.2 |
+ 79.2 ± 4.3 |
+ — |
+ — |
+
+
+
+
+* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper.
diff --git a/pyproject.toml b/pyproject.toml
index bd81b41..e87419f 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "polygraph-benchmark"
-version = "1.0.0"
+version = "1.0.1"
description = "Evaluation benchmarks for graph generative models"
readme = "README.md"
authors = [