|
| 1 | +Internal working - java.util.Hashtable (detailed) |
| 2 | + |
| 3 | +Overview |
| 4 | +-------- |
| 5 | +Hashtable is a legacy implementation of the Map interface that stores key → value pairs |
| 6 | +in a hash table (bucket array). It was part of early Java and is synchronized (thread-safe |
| 7 | +by using intrinsic locks on the Hashtable instance). While still available, in modern code |
| 8 | +you usually prefer ConcurrentHashMap for high-concurrency scenarios; Hashtable is useful |
| 9 | +to understand legacy behavior and basic synchronized hash-table design. |
| 10 | + |
| 11 | +Core ideas (short) |
| 12 | +- Backing structure: an array of buckets; each bucket holds one or more entries (nodes). |
| 13 | +- Key → bucket mapping: compute hashCode(key) → normalize → index = (hash mod buckets.length). |
| 14 | +- Collision handling: entries in the same bucket are chained (linked list of Entry nodes). |
| 15 | +- Thread-safety: most public methods are synchronized on the Hashtable instance. |
| 16 | +- Iteration: legacy enumerations (keys(), elements()) exist; collection-view iterators behave |
| 17 | + similarly to other collection iterators (modern implementations are fail-fast). |
| 18 | +- Rehashing: when load threshold exceeded, the bucket array is grown and entries rehashed. |
| 19 | + |
| 20 | +Data structure (conceptual) |
| 21 | +--------------------------- |
| 22 | +Hashtable stores an internal array of Node/Entry objects: |
| 23 | + |
| 24 | +internal array: Entry[] table; |
| 25 | +each Entry: { final K key; V value; Entry next; int hash; } |
| 26 | + |
| 27 | +Simple ASCII: |
| 28 | + |
| 29 | +table index: 0 1 2 3 4 ... |
| 30 | + ↓ ↓ ↓ ↓ ↓ |
| 31 | +bucket 0 -> null |
| 32 | +bucket 1 -> [Entry(hash=17, key=K1, value=V1) -> null] |
| 33 | +bucket 2 -> [Entry(hash=26, key=K2, value=V2) -> Entry(hash=106, key=K3, value=V3) -> null] |
| 34 | +bucket 3 -> null |
| 35 | +bucket 4 -> [Entry(hash=4, key=K4, value=V4) -> null] |
| 36 | + |
| 37 | +(Here bucket 2 shows a collision chain with two entries.) |
| 38 | + |
| 39 | + |
| 40 | +How a key maps to a bucket (conceptual) |
| 41 | +--------------------------------------- |
| 42 | +1. Compute raw hash: int h = key.hashCode(); |
| 43 | +2. Optionally remix / smear bits to reduce clustering. |
| 44 | +3. Compute index: index = (h & 0x7FFFFFFF) % table.length |
| 45 | + (this gives a non-negative index within array bounds). |
| 46 | +4. Walk the bucket linked list to find a matching entry (equals()) or to append. |
| 47 | + |
| 48 | + |
| 49 | +Typical operation flows (ASCII + steps) |
| 50 | +-------------------------------------- |
| 51 | + |
| 52 | +PUT (insert / update) |
| 53 | +--------------------- |
| 54 | +Thread enters put(key, value) — method is synchronized (single lock on the Hashtable instance). |
| 55 | + |
| 56 | +Steps: |
| 57 | + 1. compute h = hash(key) |
| 58 | + 2. index = bucketIndex(h) |
| 59 | + 3. if table[index] == null: |
| 60 | + create new Entry(k,v) and place it at table[index] |
| 61 | + else: |
| 62 | + iterate entry = table[index]; while(entry != null) |
| 63 | + if entry.hash == h && entry.key.equals(key) -> update entry.value = value; return |
| 64 | + entry = entry.next |
| 65 | + // no existing key found, prepend or append new Entry to chain |
| 66 | + 4. size++ |
| 67 | + 5. if size > threshold -> rehash() (grow table and re-distribute entries) |
| 68 | + |
| 69 | + |
| 70 | +ASCII: |
| 71 | + |
| 72 | +PUT "John": hash -> 7 -> index 7 |
| 73 | +table[7] == null: |
| 74 | + table[7] -> [Entry(hash7, "John", value) -> null] |
| 75 | + |
| 76 | +GET (lookup) |
| 77 | +------------ |
| 78 | +No per-lookup locking beyond the method's synchronized entry (Hashtable.get is synchronized) |
| 79 | +Steps: |
| 80 | + 1. compute h = hash(key) ; index = bucketIndex(h) |
| 81 | + 2. entry = table[index] |
| 82 | + 3. while(entry != null) |
| 83 | + if entry.hash == h && entry.key.equals(key) return entry.value; |
| 84 | + entry = entry.next |
| 85 | + 4. return null |
| 86 | + |
| 87 | +ASCII: |
| 88 | + |
| 89 | +GET "John": hash -> 7 -> index 7 |
| 90 | +table[7] -> [Entry(hash7, "John") -> ...] => found |
| 91 | + |
| 92 | +REMOVE |
| 93 | + |
| 94 | +------ |
| 95 | + |
| 96 | +Synchronized method: |
| 97 | + 1. compute h, index |
| 98 | + 2. walk chain keeping previous pointer |
| 99 | + 3. if found remove by linking prev.next = current.next (or table[index] = current.next if removing head) |
| 100 | + 4. size--, set removed node fields to null for GC |
| 101 | + |
| 102 | + |
| 103 | +Rehash (growing the table) |
| 104 | +-------------------------- |
| 105 | +- When size exceeds threshold (based on load factor), Hashtable grows its internal bucket array. |
| 106 | +- All entries are re-inserted/rehashed into the new array (index changes because table.length changed). |
| 107 | +- Rehash is an expensive global operation (synchronized, blocking other operations during rehash). |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +Thread-safety model |
| 112 | +------------------- |
| 113 | +- Hashtable synchronizes most methods (get/put/remove/containsKey etc.) by using synchronized method |
| 114 | + declarations. That means a single intrinsic lock (this) serializes access to these operations. |
| 115 | +- Pros: simple correctness model; no user-level locking required for single-method atomicity. |
| 116 | +- Cons: coarse-grained lock → low concurrency (only one thread can run any synchronized Hashtable method at a time), |
| 117 | + can become contention bottleneck under high concurrency. |
| 118 | + |
| 119 | + |
| 120 | +Iteration and enumeration |
| 121 | +------------------------- |
| 122 | +- Legacy APIs: keys() and elements() return Enumeration objects (older pre-Collections API). |
| 123 | + - Enumeration was introduced early, before Iterator existed. It offers hasMoreElements() / nextElement(). |
| 124 | +- Modern usage: use entrySet(), keySet(), values() and their iterators. |
| 125 | +- Iterators from Hashtable's view collections behave like other collection iterators (attempt to be |
| 126 | + fail-fast — detect concurrent modification and throw ConcurrentModificationException). |
| 127 | +- Enumeration (legacy) may behave differently historically; prefer modern iterators and synchronize externally |
| 128 | + if iterating and mutating concurrently. |
| 129 | + |
| 130 | + |
| 131 | +Complexity summary |
| 132 | +------------------ |
| 133 | +- get/put/remove: average O(1) if hash codes distribute keys well. |
| 134 | +- Worst-case (many collisions in same bucket): O(n) per operation (walk the chain). |
| 135 | +- Rehash cost: O(n) when resizing. |
| 136 | +- Iteration: O(n) to traverse buckets and chains. |
| 137 | + |
| 138 | + |
| 139 | +Collision handling |
| 140 | +------------------ |
| 141 | +- Hashtable uses chaining: entries with the same bucket index are linked via a singly-linked list. |
| 142 | +- (Note: HashMap in Java 8+ can convert long chains into balanced trees to avoid O(n) worst-case; |
| 143 | + legacy Hashtable historically does not treeify — it keeps chaining. Prefer modern maps for treeified behavior.) |
| 144 | + |
| 145 | + |
| 146 | +Memory / GC aspects |
| 147 | +------------------- |
| 148 | +- Each Entry node holds references to key, value, next — these must be nulled when removed to allow GC. |
| 149 | +- Rehash involves creating a new bucket array and moving references (temporary increased memory pressure). |
| 150 | + |
| 151 | + |
| 152 | +When to use Hashtable today |
| 153 | +--------------------------- |
| 154 | +- Rarely recommended for new code. |
| 155 | +- Use when you need a simple, legacy-style synchronized map and are constrained to pre-Java 1.5 patterns. |
| 156 | +- Prefer ConcurrentHashMap for high-concurrency, better scalability and finer-grained locking. |
| 157 | +- If you need atomic compound operations (check-then-act), you may still need external synchronization or |
| 158 | + use compute/putIfAbsent style methods on concurrent maps. |
| 159 | + |
| 160 | + |
| 161 | +Differences vs HashMap / ConcurrentHashMap (brief) |
| 162 | +-------------------------------------------------- |
| 163 | +- Hashtable: synchronized methods (coarse-grained lock), legacy. |
| 164 | +- HashMap: non-synchronized, faster in single-threaded contexts, modern features (treeification on heavy collisions). |
| 165 | +- ConcurrentHashMap: thread-safe with high concurrency, internal partitioning/lock-striping |
| 166 | +(or non-blocking CAS internals in modern JDK), preferred for multithreaded workloads. |
| 167 | + |
| 168 | + |
| 169 | +ASCII flows (examples) |
| 170 | +---------------------- |
| 171 | + |
| 172 | +1) Simple insert and collision |
| 173 | + |
| 174 | +Start: empty table (length = 8 for example) |
| 175 | +Index mapping: h("A") → 2 ; h("B") → 2 (collision) |
| 176 | + |
| 177 | +Before: |
| 178 | +table[2] -> null |
| 179 | + |
| 180 | +put("A", 1): |
| 181 | +table[2] -> [A:1] |
| 182 | + |
| 183 | +put("B", 2): (collision: same bucket) |
| 184 | +table[2] -> [B:2] -> [A:1] |
| 185 | + |
| 186 | +(get order depends on insertion; we may prepend or append based on implementation) |
| 187 | + |
| 188 | +2) get after collisions |
| 189 | +get("B") -> index 2 -> check first node: B -> return 2 |
| 190 | +get("A") -> index 2 -> check first node: B (not match); next node: A -> return 1 |
| 191 | + |
| 192 | +3) remove head vs remove middle |
| 193 | +remove("B") -> unlink head: |
| 194 | +table[2] -> [A:1] |
| 195 | + |
| 196 | +remove("A") -> table[2] -> null |
| 197 | + |
| 198 | +4) rehash |
| 199 | +size crosses threshold → allocate new table with larger size → rehash all entries: |
| 200 | +old table: |
| 201 | + idx2 -> [C] -> [D] -> [E] ... |
| 202 | +new length bigger → entries get new indexes according to new length → redistributed. |
| 203 | + |
| 204 | +Pitfalls & interview points |
| 205 | +--------------------------- |
| 206 | +- Hashtable is synchronized — but synchronization is coarse-grained; ConcurrentHashMap is a better choice |
| 207 | + when multiple threads need concurrent access without full serialization. |
| 208 | +- HashCode quality matters: poor hashCode() causes clustering → degraded performance. |
| 209 | +- Beware of relying on insertion/iteration order — Hashtable makes no ordering guarantees. |
| 210 | +- Rehash/resizing is expensive; when you know expected size, pre-allocate a sensible initial capacity. |
| 211 | +- Legacy API: Enumeration vs modern Iterator — prefer the latter in new code. |
| 212 | + |
| 213 | +Short FAQ |
| 214 | +--------- |
| 215 | +Q: Is Hashtable deprecated? |
| 216 | +A: Not deprecated, but considered legacy. Use ConcurrentHashMap or synchronized wrappers of HashMap depending on need. |
| 217 | + |
| 218 | +Q: Are Hashtable iterators fail-fast? |
| 219 | +A: Iterators from the collection-view methods follow the standard fail-fast behavior (detect structural modifications). |
| 220 | + The legacy Enumeration API behaves differently historically; avoid it for modern code. |
| 221 | + |
| 222 | +Q: Does Hashtable treeify buckets like HashMap (Java 8+)? |
| 223 | +A: No — Hashtable typically uses linked-list chaining and does not treeify buckets. HashMap in Java 8 added |
| 224 | +treeification to improve worst-case behavior. |
| 225 | + |
| 226 | +Q: How to make iteration safe in a multithreaded context? |
| 227 | +A: Either synchronize externally on the Hashtable instance during the whole iteration: |
| 228 | + synchronized(table) { for (K k : table.keySet()) { ... } } |
| 229 | + or use concurrent collections (ConcurrentHashMap) that provide weakly consistent iterators designed for concurrency. |
| 230 | + |
| 231 | +Practical recommendation |
| 232 | +------------------------ |
| 233 | +- For legacy compatibility or very simple synchronized map needs, Hashtable is acceptable. |
| 234 | +- For real concurrent applications use ConcurrentHashMap. |
| 235 | +- For single-threaded or externally synchronized contexts use HashMap for performance. |
| 236 | +- Always design keys with good hashCode() and stable equals() to avoid correctness and performance problems. |
0 commit comments