|
| 1 | +ConcurrentHashMap — Internal Working |
| 2 | +⸻ |
| 3 | + |
| 4 | +Overview (short) |
| 5 | + |
| 6 | +ConcurrentHashMap is Java’s high-performance, thread-safe implementation of the Map interface designed for concurrent |
| 7 | +access. |
| 8 | +It lets many threads read and write without global locking and achieves this using fine-grained synchronization and |
| 9 | +lock-free techniques (CAS). |
| 10 | + |
| 11 | +Java 8 redesigned it (removed segment-locking) and uses a combination of CAS, synchronized blocks on |
| 12 | +individual bins, and specialized counters to scale well. |
| 13 | + |
| 14 | +⸻ |
| 15 | + |
| 16 | +High-level goals |
| 17 | + • Allow high concurrency for reads and writes. |
| 18 | + • Avoid a single global lock (unlike Hashtable). |
| 19 | + • Provide fast get() (no locking in common case). |
| 20 | + • Make put()/remove() scalable via localized synchronization. |
| 21 | + • Provide useful concurrent bulk/atomic operations (e.g., computeIfAbsent, forEach, reduce). |
| 22 | + |
| 23 | +⸻ |
| 24 | + |
| 25 | +Key Data Structures (Java 8+) |
| 26 | + • Node<K,V>[] table |
| 27 | +Array of buckets. Each slot may point to: |
| 28 | + • null — empty |
| 29 | + • Node — a linked list of nodes (chained) |
| 30 | + • TreeNode — root of a balanced tree when a bucket becomes large (treeified) |
| 31 | + • ForwardingNode — special marker while resizing/transfer |
| 32 | + • Node fields (conceptual) |
| 33 | + |
| 34 | +final int hash; |
| 35 | +final K key; |
| 36 | +volatile V val; |
| 37 | +volatile Node<K,V> next; |
| 38 | + |
| 39 | + |
| 40 | +• TreeNode |
| 41 | +A red-black tree node used when a bucket’s chain becomes too long (reduces collision worst-case cost). |
| 42 | + |
| 43 | +• baseCount and counterCells (striped counters / LongAdder-like) |
| 44 | +Used for scalable size accounting without global locking. |
| 45 | + |
| 46 | +⸻ |
| 47 | + |
| 48 | +Important Concurrency Primitives Used |
| 49 | +• CAS (Compare-And-Swap) on table entries and on the first node of a bin (via Unsafe.compareAndSwapObject) |
| 50 | +— lock-free attempt to install a node. |
| 51 | +• synchronized on a bin: if CAS fails or bin is non-trivial, code synchronizes on the bin’s first node |
| 52 | +(or on a TreeBin) for safe updates. |
| 53 | +• volatile reads/writes to ensure visibility (e.g., val and next are volatile so other threads see updates quickly). |
| 54 | +• ForwardingNode during resizing — readers encountering it help continue the lookup/transfer to the new table. |
| 55 | + |
| 56 | +⸻ |
| 57 | + |
| 58 | +Typical Operation Flows |
| 59 | + |
| 60 | +get(key) (fast, no locking) |
| 61 | + 1. Compute hash, find table[index]. |
| 62 | + 2. Read first node reference (volatile). |
| 63 | + 3. If node is null → return null. |
| 64 | + 4. If node.hash == hash && node.key.equals(key) → return node.val (volatile read). |
| 65 | + 5. Else follow next chain (or traverse TreeNode), comparing keys. |
| 66 | +Note: No locking; volatility + final fields + safe publication gives correctness. |
| 67 | + |
| 68 | +put(key, value) (concurrent-safe) |
| 69 | +1. Compute hash and index. |
| 70 | +2. Try CAS to install a new Node into an empty slot. |
| 71 | + |
| 72 | +3. If slot non-empty: |
| 73 | +• If first node hash/key matches and it is not a TreeNode → synchronized on first node, traverse, replace or append. |
| 74 | +• If the bin has many nodes → treeify (convert to TreeBin/TreeNode), then use tree logic under lock. |
| 75 | + |
| 76 | +4. Update size counters using addCount(...) (updates baseCount or counterCells via CAS and cells). |
| 77 | +5. If size threshold exceeded → initiate/responsible for resize (may be done by a thread performing put which notices threshold). |
| 78 | + |
| 79 | +Resizing (rehash/transfer) |
| 80 | +• When size > threshold (capacity * loadFactor), table is grown (usually doubled). |
| 81 | +• A ForwardingNode is placed in old bucket to indicate that entries have been moved. |
| 82 | +• Multiple threads can help transfer: a migrating thread transfers some buckets; |
| 83 | +helpers encountering ForwardingNode continue in the new table. |
| 84 | +• Resizing is done per-bucket; no single global lock — concurrency-friendly. |
| 85 | + |
| 86 | +⸻ |
| 87 | + |
| 88 | +ASCII Diagrams |
| 89 | + |
| 90 | +Basic table & chaining (before treeification) |
| 91 | + |
| 92 | +table[0] -> null |
| 93 | +table[1] -> Node(h1, K1, v1) -> Node(h11, K11, v11) -> null |
| 94 | +table[2] -> Node(h2, K2, v2) -> null |
| 95 | + |
| 96 | +Concurrent get and put example: |
| 97 | + |
| 98 | +Thread A: get(K1) reads table[1] -> Node(h1,K1,v1) -> returns v1 (no lock) |
| 99 | +Thread B: put(Kx) tries CAS into table[3] (empty) -> succeeds (no lock) |
| 100 | +Thread C: put(K1) sees table[1] non-empty -> synchronize on Node(h1,K1,...) -> update existing val or append |
| 101 | + |
| 102 | +Resizing with ForwardingNode (simplified) |
| 103 | + |
| 104 | +Old table: |
| 105 | +table[1] -> ForwardingNode -> points to NewTable |
| 106 | + |
| 107 | +New table: |
| 108 | +newTable[1] -> Node(... moved ...) |
| 109 | + |
| 110 | +⸻ |
| 111 | + |
| 112 | +Complexity Summary |
| 113 | +• get() — expected O(1) (no locking, follow at most few nodes or tree lookup) |
| 114 | +• put()/remove() — expected O(1) amortized (may be O(log n) in tree bins) |
| 115 | +• Iteration (concurrent) — weakly consistent, O(n) to traverse |
| 116 | +• Resizing — expensive but infrequent (amortized cost) |
| 117 | + |
| 118 | +⸻ |
| 119 | + |
| 120 | +Treeification |
| 121 | +• If a bin’s chain becomes long (threshold ~ 8), it may be converted to a balanced tree (TreeNode) to |
| 122 | +keep worst-case access O(log n). |
| 123 | +• Treeification happens only when table is large enough; otherwise table may be expanded instead. |
| 124 | + |
| 125 | +⸻ |
| 126 | + |
| 127 | +Size Accounting and size() nuance |
| 128 | +• ConcurrentHashMap uses striped counters (baseCount + counterCells) to avoid contention on increments. |
| 129 | +• size() may be expensive because it aggregates counters and tries to provide consistent size; mappingCount() |
| 130 | +returns a long with better accuracy in some implementations. |
| 131 | +• isEmpty() and size() are not constant-time atomic snapshots — they are best-effort in a concurrent environment. |
| 132 | + |
| 133 | +⸻ |
| 134 | + |
| 135 | +Memory Visibility & Safety Guarantees |
| 136 | +• Volatile reads/writes and final fields ensure safe publication: once a node is visible, |
| 137 | +its immutable key and hash are visible to others; volatile val and next ensure visibility of changes along the chain. |
| 138 | +• get() never blocks and can be used safely in high-read scenarios. |
| 139 | + |
| 140 | +⸻ |
| 141 | + |
| 142 | +Common Methods and Concurrent Variants |
| 143 | +• get(K), put(K,V), remove(K), containsKey(K), containsValue(V) (value search is O(n)). |
| 144 | +• Atomic helpers: putIfAbsent, remove(K,V), replace(K, oldV, newV). |
| 145 | + |
| 146 | +• Functional and parallel-friendly: |
| 147 | +• compute, computeIfAbsent, computeIfPresent, merge |
| 148 | +• forEach, forEachKey, forEachValue, reduce, search — these can be parallelized internally using |
| 149 | + ForkJoinPool in bulk operations. |
| 150 | +• keySet(), entrySet() — views are weakly consistent (reflect some, but not necessarily all, updates). |
| 151 | + |
| 152 | +⸻ |
| 153 | + |
| 154 | +Practical Applications |
| 155 | +• Caches (concurrent access) |
| 156 | +• Shared registries and maps in server applications |
| 157 | +• High-concurrency data structures (session stores, counters in multi-threaded programs) |
| 158 | +• Replace synchronized HashMap usage for thread-safety without large contention |
| 159 | + |
| 160 | +⸻ |
| 161 | + |
| 162 | +When to Use vs Alternatives |
| 163 | +• Use ConcurrentHashMap when many threads concurrently read & modify a shared map. |
| 164 | +• If you need predictable iteration order → ConcurrentSkipListMap (sorted) or |
| 165 | + Collections.synchronizedMap(new LinkedHashMap(...)) (but latter has coarse locking). |
| 166 | +• For single-threaded usage or external synchronization → a plain HashMap is simpler and faster. |
| 167 | + |
| 168 | +⸻ |
| 169 | + |
| 170 | +Interview / Deep-dive Points to Remember |
| 171 | +• Java 7 used segment-locking (array of Segments); Java 8 switched to a Node-based approach with |
| 172 | +CAS and per-bin locking — this produces better scalability and memory efficiency. |
| 173 | +• get() is non-blocking (no synchronized) — a key performance characteristic. |
| 174 | +• Resizing is cooperative and multi-threaded; ForwardingNode signals moved bins. |
| 175 | +• TreeNode conversion reduces worst-case O(n) chain lookup to O(log n). |
| 176 | +• Counters use LongAdder-style cells for reduced contention. |
| 177 | +• Iterators are weakly consistent — they reflect some of the modifications done after iterator creation; |
| 178 | +they do not throw ConcurrentModificationException. |
| 179 | + |
| 180 | +⸻ |
| 181 | + |
| 182 | +Final Recap (one-liner) |
| 183 | + |
| 184 | +ConcurrentHashMap = high-performance concurrent map using CAS + synchronized per-bin updates + treeified bins + |
| 185 | +striped counters → provides scalable, mostly lock-free reads and highly concurrent writes suitable for |
| 186 | +multithreaded Java applications. |
| 187 | + |
| 188 | +⸻ |
0 commit comments