Skip to content
This repository was archived by the owner on Nov 9, 2017. It is now read-only.

Commit e7fe097

Browse files
kbleesdscho
authored andcommitted
add a hashtable implementation that supports O(1) removal
The existing hashtable implementation (in hash.[ch]) uses open addressing (i.e. resolve hash collisions by distributing entries across the table). Thus, removal is difficult to implement with less than O(n) complexity. Resolving collisions of entries with identical hashes (e.g. via chaining) is left to the client code. Add a hashtable implementation that supports O(1) removal and is slightly easier to use due to builtin entry chaining. Supports all basic operations init, free, get, add, remove and iteration. Also includes ready-to-use hash functions based on the public domain FNV-1 algorithm (http://www.isthe.com/chongo/tech/comp/fnv). The per-entry data structure (hashmap_entry) is piggybacked in front of the client's data structure to save memory. See test-hashmap.c for usage examples. The hashtable is resized by a factor of four when 80% full. With these settings, average memory consumption is about 2/3 of hash.[ch], and insertion is about twice as fast due to less frequent resizing. Lookups are also slightly faster, because entries are strictly confined to their bucket (i.e. no data of other buckets needs to be traversed). Signed-off-by: Karsten Blees <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 3e2a264 commit e7fe097

File tree

6 files changed

+1117
-0
lines changed

6 files changed

+1117
-0
lines changed
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
hashmap API
2+
===========
3+
4+
The hashmap API is a generic implementation of hash-based key-value mappings.
5+
6+
Data Structures
7+
---------------
8+
9+
`struct hashmap`::
10+
11+
The hash table structure.
12+
+
13+
The `size` member keeps track of the total number of entries. The `cmpfn`
14+
member is a function used to compare two entries for equality. The `table` and
15+
`tablesize` members store the hash table and its size, respectively.
16+
17+
`struct hashmap_entry`::
18+
19+
An opaque structure representing an entry in the hash table, which must
20+
be used as first member of user data structures. Ideally it should be
21+
followed by an int-sized member to prevent unused memory on 64-bit
22+
systems due to alignment.
23+
+
24+
The `hash` member is the entry's hash code and the `next` member points to the
25+
next entry in case of collisions (i.e. if multiple entries map to the same
26+
bucket).
27+
28+
`struct hashmap_iter`::
29+
30+
An iterator structure, to be used with hashmap_iter_* functions.
31+
32+
Types
33+
-----
34+
35+
`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
36+
37+
User-supplied function to test two hashmap entries for equality. Shall
38+
return 0 if the entries are equal.
39+
+
40+
This function is always called with non-NULL `entry` / `entry_or_key`
41+
parameters that have the same hash code. When looking up an entry, the `key`
42+
and `keydata` parameters to hashmap_get and hashmap_remove are always passed
43+
as second and third argument, respectively. Otherwise, `keydata` is NULL.
44+
45+
Functions
46+
---------
47+
48+
`unsigned int strhash(const char *buf)`::
49+
`unsigned int strihash(const char *buf)`::
50+
`unsigned int memhash(const void *buf, size_t len)`::
51+
`unsigned int memihash(const void *buf, size_t len)`::
52+
53+
Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
54+
http://www.isthe.com/chongo/tech/comp/fnv).
55+
+
56+
`strhash` and `strihash` take 0-terminated strings, while `memhash` and
57+
`memihash` operate on arbitrary-length memory.
58+
+
59+
`strihash` and `memihash` are case insensitive versions.
60+
61+
`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
62+
63+
Initializes a hashmap structure.
64+
+
65+
`map` is the hashmap to initialize.
66+
+
67+
The `equals_function` can be specified to compare two entries for equality.
68+
If NULL, entries are considered equal if their hash codes are equal.
69+
+
70+
If the total number of entries is known in advance, the `initial_size`
71+
parameter may be used to preallocate a sufficiently large table and thus
72+
prevent expensive resizing. If 0, the table is dynamically resized.
73+
74+
`void hashmap_free(struct hashmap *map, int free_entries)`::
75+
76+
Frees a hashmap structure and allocated memory.
77+
+
78+
`map` is the hashmap to free.
79+
+
80+
If `free_entries` is true, each hashmap_entry in the map is freed as well
81+
(using stdlib's free()).
82+
83+
`void hashmap_entry_init(void *entry, int hash)`::
84+
85+
Initializes a hashmap_entry structure.
86+
+
87+
`entry` points to the entry to initialize.
88+
+
89+
`hash` is the hash code of the entry.
90+
91+
`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
92+
93+
Returns the hashmap entry for the specified key, or NULL if not found.
94+
+
95+
`map` is the hashmap structure.
96+
+
97+
`key` is a hashmap_entry structure (or user data structure that starts with
98+
hashmap_entry) that has at least been initialized with the proper hash code
99+
(via `hashmap_entry_init`).
100+
+
101+
If an entry with matching hash code is found, `key` and `keydata` are passed
102+
to `hashmap_cmp_fn` to decide whether the entry matches the key.
103+
104+
`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
105+
106+
Returns the next equal hashmap entry, or NULL if not found. This can be
107+
used to iterate over duplicate entries (see `hashmap_add`).
108+
+
109+
`map` is the hashmap structure.
110+
+
111+
`entry` is the hashmap_entry to start the search from, obtained via a previous
112+
call to `hashmap_get` or `hashmap_get_next`.
113+
114+
`void hashmap_add(struct hashmap *map, void *entry)`::
115+
116+
Adds a hashmap entry. This allows to add duplicate entries (i.e.
117+
separate values with the same key according to hashmap_cmp_fn).
118+
+
119+
`map` is the hashmap structure.
120+
+
121+
`entry` is the entry to add.
122+
123+
`void *hashmap_put(struct hashmap *map, void *entry)`::
124+
125+
Adds or replaces a hashmap entry. If the hashmap contains duplicate
126+
entries equal to the specified entry, only one of them will be replaced.
127+
+
128+
`map` is the hashmap structure.
129+
+
130+
`entry` is the entry to add or replace.
131+
+
132+
Returns the replaced entry, or NULL if not found (i.e. the entry was added).
133+
134+
`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
135+
136+
Removes a hashmap entry matching the specified key. If the hashmap
137+
contains duplicate entries equal to the specified key, only one of
138+
them will be removed.
139+
+
140+
`map` is the hashmap structure.
141+
+
142+
`key` is a hashmap_entry structure (or user data structure that starts with
143+
hashmap_entry) that has at least been initialized with the proper hash code
144+
(via `hashmap_entry_init`).
145+
+
146+
If an entry with matching hash code is found, `key` and `keydata` are
147+
passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
148+
+
149+
Returns the removed entry, or NULL if not found.
150+
151+
`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
152+
`void *hashmap_iter_next(struct hashmap_iter *iter)`::
153+
`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
154+
155+
Used to iterate over all entries of a hashmap.
156+
+
157+
`hashmap_iter_init` initializes a `hashmap_iter` structure.
158+
+
159+
`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
160+
more entries.
161+
+
162+
`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
163+
and returns the first entry, if any).
164+
165+
Usage example
166+
-------------
167+
168+
Here's a simple usage example that maps long keys to double values.
169+
[source,c]
170+
------------
171+
struct hashmap map;
172+
173+
struct long2double {
174+
struct hashmap_entry ent; /* must be the first member! */
175+
long key;
176+
double value;
177+
};
178+
179+
static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
180+
{
181+
return !(e1->key == e2->key);
182+
}
183+
184+
void long2double_init(void)
185+
{
186+
hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
187+
}
188+
189+
void long2double_free(void)
190+
{
191+
hashmap_free(&map, 1);
192+
}
193+
194+
static struct long2double *find_entry(long key)
195+
{
196+
struct long2double k;
197+
hashmap_entry_init(&k, memhash(&key, sizeof(long)));
198+
k.key = key;
199+
return hashmap_get(&map, &k, NULL);
200+
}
201+
202+
double get_value(long key)
203+
{
204+
struct long2double *e = find_entry(key);
205+
return e ? e->value : 0;
206+
}
207+
208+
void set_value(long key, double value)
209+
{
210+
struct long2double *e = find_entry(key);
211+
if (!e) {
212+
e = malloc(sizeof(struct long2double));
213+
hashmap_entry_init(e, memhash(&key, sizeof(long)));
214+
e->key = key;
215+
hashmap_add(&map, e);
216+
}
217+
e->value = value;
218+
}
219+
------------
220+
221+
Using variable-sized keys
222+
-------------------------
223+
224+
The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
225+
`hashmap_entry` structure as key to find the correct entry. If the key data is
226+
variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
227+
to create a full-fledged entry structure on the heap and copy all the key data
228+
into the structure.
229+
230+
In this case, the `keydata` parameter can be used to pass
231+
variable-sized key data directly to the comparison function, and the `key`
232+
parameter can be a stripped-down, fixed size entry structure allocated on the
233+
stack.
234+
235+
See test-hashmap.c for an example using arbitrary-length strings as keys.

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -555,6 +555,7 @@ TEST_PROGRAMS_NEED_X += test-date
555555
TEST_PROGRAMS_NEED_X += test-delta
556556
TEST_PROGRAMS_NEED_X += test-dump-cache-tree
557557
TEST_PROGRAMS_NEED_X += test-genrandom
558+
TEST_PROGRAMS_NEED_X += test-hashmap
558559
TEST_PROGRAMS_NEED_X += test-index-version
559560
TEST_PROGRAMS_NEED_X += test-line-buffer
560561
TEST_PROGRAMS_NEED_X += test-match-trees
@@ -672,6 +673,7 @@ LIB_H += gpg-interface.h
672673
LIB_H += graph.h
673674
LIB_H += grep.h
674675
LIB_H += hash.h
676+
LIB_H += hashmap.h
675677
LIB_H += help.h
676678
LIB_H += http.h
677679
LIB_H += kwset.h
@@ -803,6 +805,7 @@ LIB_OBJS += gpg-interface.o
803805
LIB_OBJS += graph.o
804806
LIB_OBJS += grep.o
805807
LIB_OBJS += hash.o
808+
LIB_OBJS += hashmap.o
806809
LIB_OBJS += help.o
807810
LIB_OBJS += hex.o
808811
LIB_OBJS += ident.o

0 commit comments

Comments
 (0)