You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Hnswlib - fast approximate nearest neighbor search
2
-
Header-only C++ HNSW implementation with python bindings.
2
+
Header-only C++ HNSW implementation with python bindings, insertions and updates.
3
3
4
4
**NEWS:**
5
5
6
+
**version 0.7.0**
6
7
7
-
**version 0.6.2**
8
-
9
-
* Fixed a bug in saving of large pickles. The pickles with > 4GB could have been corrupted. Thanks Kai Wohlfahrt for reporting.
10
-
* Thanks to ([@GuyAv46](https://github.com/GuyAv46)) hnswlib inner product now is more consitent accross architectures (SSE, AVX, etc).
11
-
*
12
-
13
-
**version 0.6.1**
14
-
15
-
* Thanks to ([@tony-kuo](https://github.com/tony-kuo)) hnswlib AVX512 and AVX builds are not backwards-compatible with older SSE and non-AVX512 architectures.
16
-
* Thanks to ([@psobot](https://github.com/psobot)) there is now a sencible message instead of segfault when passing a scalar to get_items.
17
-
* Thanks to ([@urigoren](https://github.com/urigoren)) hnswlib has a lazy index creation python wrapper.
18
-
19
-
**version 0.6.0**
20
-
* Thanks to ([@dyashuni](https://github.com/dyashuni)) hnswlib now uses github actions for CI, there is a search speedup in some scenarios with deletions. `unmark_deleted(label)` is now also a part of the python interface (note now it throws an exception for double deletions).
21
-
* Thanks to ([@slice4e](https://github.com/slice4e)) we now support AVX512; thanks to ([@LTLA](https://github.com/LTLA)) the cmake interface for the lib is now updated.
22
-
* Thanks to ([@alonre24](https://github.com/alonre24)) we now have a python bindings for brute-force (and examples for recall tuning: [TESTING_RECALL.md](TESTING_RECALL.md).
23
-
* Thanks to ([@dorosy-yeong](https://github.com/dorosy-yeong)) there is a bug fixed in the handling large quantities of deleted elements and large K.
24
-
25
-
8
+
* Added support to filtering (#402, #430) by [@kishorenc](https://github.com/kishorenc)
9
+
* Added python interface for filtering (though note its performance is limited by GIL) (#417) by [@gtsoukas](https://github.com/gtsoukas)
10
+
* Added support for replacing the elements that were marked as delete with newly inserted elements (to control the size of the index, #418) by [@dyashuni](https://github.com/dyashuni)
11
+
* Fixed data races/deadlocks in updates/insertion, added stress test for multithreaded operation (#418) by [@dyashuni](https://github.com/dyashuni)
* global linkages (#383) by [@MasterAler](https://github.com/MasterAler), USE_SSE usage in MSVC (#408) by [@alxvth](https://github.com/alxvth)
26
14
27
15
28
16
### Highlights:
29
17
1) Lightweight, header-only, no dependencies other than C++ 11
30
-
2) Interfaces for C++, Java, Python and R (https://github.com/jlmelville/rcpphnsw).
31
-
3) Has full support for incremental index construction. Has support for element deletions
18
+
2) Interfaces for C++, Python, external support for Java and R (https://github.com/jlmelville/rcpphnsw).
19
+
3) Has full support for incremental index construction and updating the elements. Has support for element deletions
32
20
(by marking them in index). Index is picklable.
33
21
4) Can work with custom user defined distances (C++).
34
22
5) Significantly less memory footprint and faster build time compared to current nmslib's implementation.
@@ -50,7 +38,7 @@ Note that inner product is not an actual metric. An element can be closer to som
50
38
51
39
For other spaces use the nmslib library https://github.com/nmslib/nmslib.
52
40
53
-
#### Short API description
41
+
#### API description
54
42
*`hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.
55
43
56
44
`hnswlib.Index` methods:
@@ -80,7 +68,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
80
68
*`knn_query(data, k = 1, num_threads = -1, filter = None)` make a batch query for `k` closest elements for each element of the
81
69
*`data` (shape:`N*dim`). Returns a numpy array of (shape:`N*k`).
82
70
*`num_threads` sets the number of cpu threads to use (-1 means use default).
83
-
*`filter` filters elements by its labels, returns elements with allowed ids
71
+
*`filter` filters elements by its labels, returns elements with allowed ids. Note that search with a filter works slow in python in multithreaded mode. It is recommended to set `num_threads=1`
84
72
* Thread-safe with other `knn_query` calls, but not with `add_items`.
85
73
86
74
*`load_index(path_to_index, max_elements = 0, allow_replace_deleted = False)` loads the index from persistence to the uninitialized index.
@@ -123,6 +111,12 @@ Properties of `hnswlib.Index` that support reading and writing:
123
111
124
112
125
113
#### Python bindings examples
114
+
[See more examples here](examples/python/EXAMPLES.md):
* Rust implementation for memory and thread safety purposes and There is A Trait to enable the user to implement its own distances. It takes as data slices of types T satisfying T:Serialize+Clone+Send+Sync.: https://github.com/jean-pierreBoth/hnswlib-rs
372
279
373
280
### 200M SIFT test reproduction
374
281
To download and extract the bigann dataset (from root directory):
375
282
```bash
376
-
python3 download_bigann.py
283
+
python tests/cpp/download_bigann.py
377
284
```
378
285
To compile:
379
286
```bash
@@ -393,7 +300,7 @@ The size of the BigANN subset (in millions) is controlled by the variable **subs
0 commit comments