Skip to content

Commit c097f05

Browse files
committed
Optimize search performance and add stress test
1 parent 8460405 commit c097f05

File tree

7 files changed

+263
-22
lines changed

7 files changed

+263
-22
lines changed

.github/workflows/test.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: Run Tests
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
pull_request:
8+
branches:
9+
- main
10+
11+
jobs:
12+
unit-tests:
13+
runs-on: ubuntu-latest
14+
steps:
15+
# Step 1: Checkout the repository code
16+
- name: Checkout code
17+
uses: actions/checkout@v3
18+
19+
# Step 2: Setup Python environment
20+
- name: Setup Python
21+
uses: actions/setup-python@v4
22+
with:
23+
python-version: 3.10
24+
25+
# Step 3: Install dependencies from requirements.txt
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install -r requirements.txt
30+
31+
# Step 4: Run unit tests using pytest
32+
- name: Run unit tests
33+
run: pytest tests

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ __pycache__/
33
*.py[cod]
44
*$py.class
55
*.DS_Store
6+
logs/
67

78

89
# C extensions

README.md

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,61 @@
4848

4949
<br>
5050

51-
## :dart: About ##
51+
## :dart: About
5252

5353
Hypergraph-DB is a lightweight, flexible, and Python-based database designed to model and manage **hypergraphs**—a generalized graph structure where edges (hyperedges) can connect any number of vertices. This makes Hypergraph-DB an ideal solution for representing complex relationships between entities in various domains, such as knowledge graphs, social networks, and scientific data modeling.
5454

5555
Hypergraph-DB provides a high-level abstraction for working with vertices and hyperedges, making it easy to add, update, query, and manage hypergraph data. With built-in support for persistence, caching, and efficient operations, Hypergraph-DB simplifies the management of hypergraph data structures.
5656

57+
**:bar_chart: Performance Test Results**
58+
59+
To demonstrate the performance of **Hypergraph-DB**, let’s consider an example:
60+
61+
- Suppose we want to construct a **hypergraph** with **1,000,000 vertices** and **200,000 hyperedges**.
62+
- Using Hypergraph-DB, it takes approximately:
63+
- **1.75 seconds** to add **1,000,000 vertices**.
64+
- **1.82 seconds** to add **200,000 hyperedges**.
65+
- Querying this hypergraph:
66+
- Retrieving information for **400,000 vertices** takes **0.51 seconds**.
67+
- Retrieving information for **400,000 hyperedges** takes **2.52 seconds**.
68+
69+
This example demonstrates the efficiency of Hypergraph-DB, even when working with large-scale hypergraphs. Below is a detailed table showing how the performance scales as the size of the hypergraph increases.
70+
71+
**Detailed Performance Results**
72+
73+
The following table shows the results of stress tests performed on Hypergraph-DB with varying scales. The tests measure the time taken to add vertices, add hyperedges, and query vertices and hyperedges.
74+
75+
| **Number of Vertices** | **Number of Hyperedges** | **Add Vertices (s)** | **Add Edges (s)** | **Query Vertices (s/queries)** | **Query Edges (s/queries)** | **Total Time (s)** |
76+
|-------------------------|--------------------------|-----------------------|-------------------|-------------------------------|----------------------------|--------------------|
77+
| 5,000 | 1,000 | 0.01 | 0.01 | 0.00/2,000 | 0.01/2,000 | 0.02 |
78+
| 10,000 | 2,000 | 0.01 | 0.01 | 0.00/4,000 | 0.02/4,000 | 0.05 |
79+
| 25,000 | 5,000 | 0.03 | 0.04 | 0.01/10,000 | 0.05/10,000 | 0.13 |
80+
| 50,000 | 10,000 | 0.06 | 0.07 | 0.02/20,000 | 0.12/20,000 | 0.26 |
81+
| 100,000 | 20,000 | 0.12 | 0.17 | 0.04/40,000 | 0.24/40,000 | 0.58 |
82+
| 250,000 | 50,000 | 0.35 | 0.40 | 0.11/100,000 | 0.61/100,000 | 1.47 |
83+
| 500,000 | 100,000 | 0.85 | 1.07 | 0.22/200,000 | 1.20/200,000 | 3.34 |
84+
| 1,000,000 | 200,000 | 1.75 | 1.82 | 0.51/400,000 | 2.52/400,000 | 6.60 |
85+
86+
---
87+
88+
**Key Observations:**
89+
90+
1. **Scalability**:
91+
Hypergraph-DB scales efficiently with the number of vertices and hyperedges. The time to add vertices and hyperedges grows linearly with the size of the hypergraph.
92+
93+
2. **Query Performance**:
94+
Querying vertices and hyperedges remains fast, even for large-scale hypergraphs. For instance:
95+
- Querying **200,000 vertices** takes only **0.22 seconds**.
96+
- Querying **200,000 hyperedges** takes only **1.20 seconds**.
97+
98+
3. **Total Time**:
99+
The total time to construct and query a hypergraph with **1,000,000 vertices** and **200,000 hyperedges** is only **6.60 seconds**, showcasing the overall efficiency of Hypergraph-DB.
100+
101+
This performance makes **Hypergraph-DB** a great choice for applications requiring fast and scalable hypergraph data management.
102+
57103
---
58104

59-
## :sparkles: Features ##
105+
## :sparkles: Features
60106

61107
:heavy_check_mark: **Flexible Hypergraph Representation**
62108
- Supports vertices (`v`) and hyperedges (`e`), where hyperedges can connect any number of vertices.
@@ -78,7 +124,7 @@ Hypergraph-DB provides a high-level abstraction for working with vertices and hy
78124

79125
---
80126

81-
## :rocket: Installation ##
127+
## :rocket: Installation
82128

83129

84130
Hypergraph-DB is a Python library. You can install it directly from PyPI using `pip`.
@@ -100,7 +146,7 @@ pip install -r requirements.txt
100146

101147
---
102148

103-
## :checkered_flag: Starting ##
149+
## :checkered_flag: Starting
104150

105151
This section provides a quick guide to get started with Hypergraph-DB, including iusage, and running basic operations. Below is an example of how to use Hypergraph-DB, based on the provided test cases.
106152

@@ -174,7 +220,7 @@ print(hg.nbr_v(1)) # Output: {3, 4}
174220
print(hg.nbr_e_of_v(1)) # Output: {(1, 3, 4)}
175221
```
176222

177-
#### **6. Persistence (Save and Load)**
223+
#### **6. Persistence (Save and Load)
178224

179225
```python
180226
# Save the hypergraph to a file
@@ -190,14 +236,14 @@ print(hg2.all_e) # Output: {(1, 3, 4)}
190236
---
191237

192238

193-
## :memo: License ##
239+
## :memo: License
194240

195241
Hypergraph-DB is open-source and licensed under the [Apache License 2.0](LICENSE). Feel free to use, modify, and distribute it as per the license terms.
196242

197243

198244
---
199245

200-
## :email: Contact ##
246+
## :email: Contact
201247

202248
Hypergraph-DB is maintained by [iMoon-Lab](http://moon-lab.tech/), Tsinghua University. If you have any questions, please feel free to contact us via email: [Yifan Feng](mailto:[email protected]).
203249

hyperdb/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33

44
from ._global import AUTHOR_EMAIL
55

6-
__version__ = "0.1.0"
6+
__version__ = "0.1.1"
77

88
__all__ = {"AUTHOR_EMAIL", "BaseHypergraphDB", "HypergraphDB"}

hyperdb/hypergraph.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def encode_e(self, e_tuple: Union[List, Set, Tuple]) -> Tuple:
112112
for v_id in tmp:
113113
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
114114
assert (
115-
v_id in self.all_v
115+
v_id in self._v_data
116116
), f"The vertex {v_id} does not exist in the hypergraph."
117117
return tuple(tmp)
118118

@@ -157,7 +157,7 @@ def add_v(self, v_id: Any, v_data: Optional[Dict] = None):
157157
assert isinstance(v_data, dict), "The vertex data must be a dictionary."
158158
else:
159159
v_data = {}
160-
if v_id not in self.all_v:
160+
if v_id not in self._v_data:
161161
self._v_data[v_id] = v_data
162162
self._v_inci[v_id] = set()
163163
else:
@@ -180,7 +180,7 @@ def add_e(self, e_tuple: Union[List, Set, Tuple], e_data: Optional[Dict] = None)
180180
else:
181181
e_data = {}
182182
e_tuple = self.encode_e(e_tuple)
183-
if e_tuple not in self.all_e:
183+
if e_tuple not in self._e_data:
184184
self._e_data[e_tuple] = e_data
185185
for v in e_tuple:
186186
self._v_inci[v].add(e_tuple)
@@ -197,7 +197,7 @@ def remove_v(self, v_id: Any):
197197
"""
198198
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
199199
assert (
200-
v_id in self.all_v
200+
v_id in self._v_data
201201
), f"The vertex {v_id} does not exist in the hypergraph."
202202
del self._v_data[v_id]
203203
for e_tuple in self._v_inci[v_id]:
@@ -220,7 +220,7 @@ def remove_e(self, e_tuple: Union[List, Set, Tuple]):
220220
), "The hyperedge must be a list, set, or tuple of vertex ids."
221221
e_tuple = self.encode_e(e_tuple)
222222
assert (
223-
e_tuple in self.all_e
223+
e_tuple in self._e_data
224224
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
225225
for v in e_tuple:
226226
self._v_inci[v].remove(e_tuple)
@@ -238,7 +238,7 @@ def update_v(self, v_id: Any, v_data: dict):
238238
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
239239
assert isinstance(v_data, dict), "The vertex data must be a dictionary."
240240
assert (
241-
v_id in self.all_v
241+
v_id in self._v_data
242242
), f"The vertex {v_id} does not exist in the hypergraph."
243243
self._v_data[v_id].update(v_data)
244244
self._clear_cache()
@@ -257,7 +257,7 @@ def update_e(self, e_tuple: Union[List, Set, Tuple], e_data: dict):
257257
assert isinstance(e_data, dict), "The hyperedge data must be a dictionary."
258258
e_tuple = self.encode_e(e_tuple)
259259
assert (
260-
e_tuple in self.all_e
260+
e_tuple in self._e_data
261261
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
262262
self._e_data[e_tuple].update(e_data)
263263
self._clear_cache()
@@ -270,7 +270,7 @@ def has_v(self, v_id: Any) -> bool:
270270
``v_id`` (``Any``): The vertex id.
271271
"""
272272
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
273-
return v_id in self.all_v
273+
return v_id in self._v_data
274274

275275
def has_e(self, e_tuple: Union[List, Set, Tuple]) -> bool:
276276
r"""
@@ -286,7 +286,7 @@ def has_e(self, e_tuple: Union[List, Set, Tuple]) -> bool:
286286
e_tuple = self.encode_e(e_tuple)
287287
except AssertionError:
288288
return False
289-
return e_tuple in self.all_e
289+
return e_tuple in self._e_data
290290

291291
def degree_v(self, v_id: Any) -> int:
292292
r"""
@@ -297,7 +297,7 @@ def degree_v(self, v_id: Any) -> int:
297297
"""
298298
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
299299
assert (
300-
v_id in self.all_v
300+
v_id in self._v_data
301301
), f"The vertex {v_id} does not exist in the hypergraph."
302302
return len(self._v_inci[v_id])
303303

@@ -313,7 +313,7 @@ def degree_e(self, e_tuple: Union[List, Set, Tuple]) -> int:
313313
), "The hyperedge must be a list, set, or tuple of vertex ids."
314314
e_tuple = self.encode_e(e_tuple)
315315
assert (
316-
e_tuple in self.all_e
316+
e_tuple in self._e_data
317317
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
318318
return len(e_tuple)
319319

@@ -326,7 +326,7 @@ def nbr_e_of_v(self, v_id: Any) -> list:
326326
"""
327327
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
328328
assert (
329-
v_id in self.all_v
329+
v_id in self._v_data
330330
), f"The vertex {v_id} does not exist in the hypergraph."
331331
return set(self._v_inci[v_id])
332332

@@ -342,7 +342,7 @@ def nbr_v_of_e(self, e_tuple: Union[List, Set, Tuple]) -> list:
342342
), "The hyperedge must be a list, set, or tuple of vertex ids."
343343
e_tuple = self.encode_e(e_tuple)
344344
assert (
345-
e_tuple in self.all_e
345+
e_tuple in self._e_data
346346
), f"The hyperedge {e_tuple} does not exist in the hypergraph."
347347
return set(e_tuple)
348348

@@ -355,7 +355,7 @@ def nbr_v(self, v_id: Any, exclude_self=True) -> list:
355355
"""
356356
assert isinstance(v_id, Hashable), "The vertex id must be hashable."
357357
assert (
358-
v_id in self.all_v
358+
v_id in self._v_data
359359
), f"The vertex {v_id} does not exist in the hypergraph."
360360
nbrs = set()
361361
for e_tuple in self._v_inci[v_id]:

performance/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)