You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Replace deleted elements at insertion
* Add multithread stress tests
* Add timeout to jobs in actions
* Add locks by label
* Remove python 3.6 tests as it is not available in Ubuntu 22.04
* Fix multithread update of elements
* Update readme and refactoring
Copy file name to clipboardExpand all lines: README.md
+109-8Lines changed: 109 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,33 +54,38 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
54
54
*`hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.
55
55
56
56
`hnswlib.Index` methods:
57
-
*`init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100)` initializes the index from with no elements.
57
+
*`init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100, allow_replace_deleted = False)` initializes the index from with no elements.
58
58
*`max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk).
59
59
*`ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)).
60
60
*`M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)).
61
+
*`allow_replace_deleted` enables replacing of deleted elements with new added ones.
61
62
62
-
*`add_items(data, ids, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure.
63
+
*`add_items(data, ids, num_threads = -1, replace_deleted = False)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure.
63
64
*`num_threads` sets the number of cpu threads to use (-1 means use default).
64
65
*`ids` are optional N-size numpy array of integer labels for all elements in `data`.
65
66
- If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
67
+
*`replace_deleted` replaces deleted elements. Note it allows to save memory.
68
+
- to use it `init_index` should be called with `allow_replace_deleted=True`
66
69
* Thread-safe with other `add_items` calls, but not with `knn_query`.
67
70
68
71
*`mark_deleted(label)` - marks the element as deleted, so it will be omitted from search results. Throws an exception if it is already deleted.
69
-
*
72
+
70
73
*`unmark_deleted(label)` - unmarks the element as deleted, so it will be not be omitted from search results.
71
74
72
75
*`resize_index(new_size)` - changes the maximum capacity of the index. Not thread safe with `add_items` and `knn_query`.
73
76
74
77
*`set_ef(ef)` - sets the query time accuracy/speed trade-off, defined by the `ef` parameter (
75
78
[ALGO_PARAMS.md](ALGO_PARAMS.md)). Note that the parameter is currently not saved along with the index, so you need to set it manually after loading.
76
79
77
-
*`knn_query(data, k = 1, num_threads = -1)` make a batch query for `k` closest elements for each element of the
80
+
*`knn_query(data, k = 1, num_threads = -1, filter = None)` make a batch query for `k` closest elements for each element of the
78
81
*`data` (shape:`N*dim`). Returns a numpy array of (shape:`N*k`).
79
82
*`num_threads` sets the number of cpu threads to use (-1 means use default).
83
+
*`filter` filters elements by its labels, returns elements with allowed ids
80
84
* Thread-safe with other `knn_query` calls, but not with `add_items`.
81
85
82
-
*`load_index(path_to_index, max_elements = 0)` loads the index from persistence to the uninitialized index.
86
+
*`load_index(path_to_index, max_elements = 0, allow_replace_deleted = False)` loads the index from persistence to the uninitialized index.
83
87
*`max_elements`(optional) resets the maximum number of elements in the structure.
88
+
*`allow_replace_deleted` specifies whether the index being loaded has enabled replacing of deleted elements.
84
89
85
90
*`save_index(path_to_index)` saves the index from persistence.
86
91
@@ -142,7 +147,7 @@ p.add_items(data, ids)
142
147
# Controlling the recall by setting ef:
143
148
p.set_ef(50) # ef should always be > k
144
149
145
-
# Query dataset, k - number of closest elements (returns 2 numpy arrays)
150
+
# Query dataset, k - number of the closest elements (returns 2 numpy arrays)
146
151
labels, distances = p.knn_query(data, k=1)
147
152
148
153
# Index objects support pickling
@@ -155,7 +160,6 @@ print(f"Parameters passed to constructor: space={p_copy.space}, dim={p_copy.dim
0 commit comments