Graph benchmarks

This repo was originally a benchmark for Kuzu described in this blog post. Since October 2025, Kuzu has been archived and is no longer maintained. Kuzu is now succeeded by a fork, Ladybug, maintained by @adsharma.

The benchmarks below are updated to compare multi-hop retrieval performance across newer systems like Ladybug and lance-graph, an open source graph engine built in Rust on top of the Lance format.

It does the following:

Generate an artificial social network dataset, including persons, interests and locations
- You can scale up the size of the artificial dataset using the scripts provided and test query performance on larger graphs
Ingest the dataset into two graph databases: Kuzu and Neo4j (community edition)
Run a set of queries that compare the query performance on a suite of queries that involve multi-hop traversals or top-k results with filters and aggregations

Setup

We use uv to manage the dependencies.

# Sync the dependencies locally
uv sync

All the dependencies are listed in pyproject.toml.

Dataset

An artificial social network dataset is generated specifically for this exercise, via the Faker Python library.

Graph schema

We'll create an artificial social network dataset of 100K person profiles, and their associated connections per the following schema.

The schema describes the following nodes and relationships, along with properties on each node:

Person node FOLLOWS another Person node
Person node LIVES_IN a City node
Person node HAS_INTEREST towards an Interest node
City node is CITY_IN a State node
State node is STATE_IN a Country node

Generate dataset

A shell script generate_data.sh is provided in the root directory of this repo that sequentially runs the Python scripts, generating the data for the nodes and edges for the social network. This is the recommended way to generate the data. A single positional argument is provided to the shell script: The number of person profiles to generate -- this is specified as an integer, as shown below.

# Generate data with 100K persons and ~2.4M edges
bash generate_data.sh 100000

This outputs:

Generating 100000 samples of data
Generate 50000 fake female profiles.
Generate 50000 fake male profiles.
Wrote 100000 person nodes to parquet
Obtained 7125 cities from countries: ['US', 'GB', 'CA']
Wrote 7117 cities to parquet
Wrote 273 states to parquet
Wrote 3 countries to parquet
Wrote 41 interests nodes to parquet
Generated 999987 edges in total without self-connections
Generated 500 super nodes for 100000 persons
Wrote 2417738 edges for 100000 persons
Generated residence cities for persons. Top 5 common cities are: Dallas, Portland, Kansas City, Manhattan, Sacramento
Wrote 250067 edges for 100000 persons
Wrote 7117 edges for 7117 cities
Wrote 273 edges for 273 states

We are now ready with the benchmark dataset in Parquet format!

Note

Due to minor differences in RNG across systems, the exact dataset reproduced on your end may be different than the one used for this benchmark. So exact numbers in the query results may not align across runs when the dataset is regenerated, but the larger trends will remain the same, no matter where this is run.

Ingest the data as a graph

Navigate to the individual directories to see the instructions on how to ingest the data into each graph system.

The generated dataset produces a rich and well-connected graph, a subgraph of which is visualized below. Certain groups of persons form a clique, and some others are central hubs with many connections, and each person can have many interests, but only one primary residence city.

Run the queries

Some sample queries are run in each DB to verify that the data is ingested correctly, and that the results are consistent with one another.

The following questions are asked of both graphs:

Query 1: Who are the top 3 most-followed persons?
Query 2: In which city does the most-followed person live?
Query 3: Which 5 cities in a particular country have the lowest average age in the network?
Query 4: How many persons between ages 30-40 are there in each country?
Query 5: How many men in London, United Kingdom have an interest in fine dining?
Query 6: Which city has the maximum number of women that like Tennis?
Query 7: Which U.S. state has the maximum number of persons between the age 23-30 who enjoy photography?
Query 8: How many second-degree paths exist in the graph?
Query 9: How many paths exist in the graph through persons age 50 to persons above age 25?

High-level results

Query	neo4j-2025.12.1 (ms)	kuzu-0.11.3 (ms)	ladybug-0.14.1 (ms)	lance-graph-0.5.1 (ms)
q1	1552.0ms	138.0ms (11.2x)	134.8ms (11.5x)	19.5ms (79.7x)
q2	395.0ms	227.9ms (1.7x)	215.5ms (1.8x)	40.5ms (9.8x)
q3	39.1ms	6.4ms (6.1x)	6.5ms (6.0x)	4.7ms (8.4x)
q4	38.3ms	9.7ms (3.9x)	9.8ms (3.9x)	2.9ms (13.4x)
q5	7.7ms	10.6ms (0.7x)	10.7ms (0.7x)	2.7ms (2.9x)
q6	21.2ms	27.3ms (0.8x)	27.1ms (0.8x)	3.3ms (6.4x)
q7	117.4ms	11.2ms (10.5x)	11.5ms (10.2x)	4.4ms (26.9x)
q8	2831.2ms	6.5ms (435.3x)	6.6ms (428.5x)	129.9ms (21.8x)
q9	2986.7ms	86.1ms (34.7x)	87.7ms (34.1x)	126.0ms (23.7x)

🔥 The n-hop path-finding queries (8 and 9) in Kuzu/Ladybug benefit from hybrid joins (WCOJ + binary) and factorization, which are query processing innovations described in the Kùzu research paper.

Explanation of results

See the results directory for an explanation of query results, and the script used to generate the plot.

Ideas for future work

Scale up the dataset

You can attempt to generate a much larger artificial dataset of ~100M nodes and ~2.5B edges, and see how the performance compares across these different systems, if you're interested.

# Generate data with 100M persons and ~2.5B edges (WARNING: takes a long time in Python!)
bash generate_data.sh 100000000

Python is really slow to generate data of that scale. Here's an example of using Rust and the fake-rs crate to do this much faster.

Relationship property aggregation

The queries 1-9 in this benchmark are all on node properties. You can add relationship properties in the dataset to see how the two DBs compare when aggregating on them. For example, add a since date property on the Follows edges to run filter queries on how long a person has been following another person.

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
assets		assets
data		data
kuzu		kuzu
ladybug		ladybug
lance_graph		lance_graph
neo4j		neo4j
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_data.sh		generate_data.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph benchmarks

Setup

Dataset

Graph schema

Generate dataset

Ingest the data as a graph

Run the queries

High-level results

Explanation of results

Ideas for future work

Scale up the dataset

Relationship property aggregation

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

prrao87/graph-benchmark

Folders and files

Latest commit

History

Repository files navigation

Graph benchmarks

Setup

Dataset

Graph schema

Generate dataset

Ingest the data as a graph

Run the queries

High-level results

Explanation of results

Ideas for future work

Scale up the dataset

Relationship property aggregation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages