Skip to content

Commit 443a365

Browse files
committed
Internal working - java.util.Hashtable (detailed).
Signed-off-by: https://github.com/Someshdiwan <[email protected]>
1 parent 741f33e commit 443a365

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
Internal working - java.util.Hashtable (detailed)
2+
3+
Overview
4+
--------
5+
Hashtable is a legacy implementation of the Map interface that stores key → value pairs
6+
in a hash table (bucket array). It was part of early Java and is synchronized (thread-safe
7+
by using intrinsic locks on the Hashtable instance). While still available, in modern code
8+
you usually prefer ConcurrentHashMap for high-concurrency scenarios; Hashtable is useful
9+
to understand legacy behavior and basic synchronized hash-table design.
10+
11+
Core ideas (short)
12+
- Backing structure: an array of buckets; each bucket holds one or more entries (nodes).
13+
- Key → bucket mapping: compute hashCode(key) → normalize → index = (hash mod buckets.length).
14+
- Collision handling: entries in the same bucket are chained (linked list of Entry nodes).
15+
- Thread-safety: most public methods are synchronized on the Hashtable instance.
16+
- Iteration: legacy enumerations (keys(), elements()) exist; collection-view iterators behave
17+
similarly to other collection iterators (modern implementations are fail-fast).
18+
- Rehashing: when load threshold exceeded, the bucket array is grown and entries rehashed.
19+
20+
Data structure (conceptual)
21+
---------------------------
22+
Hashtable stores an internal array of Node/Entry objects:
23+
24+
internal array: Entry[] table;
25+
each Entry: { final K key; V value; Entry next; int hash; }
26+
27+
Simple ASCII:
28+
29+
table index: 0 1 2 3 4 ...
30+
↓ ↓ ↓ ↓ ↓
31+
bucket 0 -> null
32+
bucket 1 -> [Entry(hash=17, key=K1, value=V1) -> null]
33+
bucket 2 -> [Entry(hash=26, key=K2, value=V2) -> Entry(hash=106, key=K3, value=V3) -> null]
34+
bucket 3 -> null
35+
bucket 4 -> [Entry(hash=4, key=K4, value=V4) -> null]
36+
37+
(Here bucket 2 shows a collision chain with two entries.)
38+
39+
40+
How a key maps to a bucket (conceptual)
41+
---------------------------------------
42+
1. Compute raw hash: int h = key.hashCode();
43+
2. Optionally remix / smear bits to reduce clustering.
44+
3. Compute index: index = (h & 0x7FFFFFFF) % table.length
45+
(this gives a non-negative index within array bounds).
46+
4. Walk the bucket linked list to find a matching entry (equals()) or to append.
47+
48+
49+
Typical operation flows (ASCII + steps)
50+
--------------------------------------
51+
52+
PUT (insert / update)
53+
---------------------
54+
Thread enters put(key, value) — method is synchronized (single lock on the Hashtable instance).
55+
56+
Steps:
57+
1. compute h = hash(key)
58+
2. index = bucketIndex(h)
59+
3. if table[index] == null:
60+
create new Entry(k,v) and place it at table[index]
61+
else:
62+
iterate entry = table[index]; while(entry != null)
63+
if entry.hash == h && entry.key.equals(key) -> update entry.value = value; return
64+
entry = entry.next
65+
// no existing key found, prepend or append new Entry to chain
66+
4. size++
67+
5. if size > threshold -> rehash() (grow table and re-distribute entries)
68+
69+
70+
ASCII:
71+
72+
PUT "John": hash -> 7 -> index 7
73+
table[7] == null:
74+
table[7] -> [Entry(hash7, "John", value) -> null]
75+
76+
GET (lookup)
77+
------------
78+
No per-lookup locking beyond the method's synchronized entry (Hashtable.get is synchronized)
79+
Steps:
80+
1. compute h = hash(key) ; index = bucketIndex(h)
81+
2. entry = table[index]
82+
3. while(entry != null)
83+
if entry.hash == h && entry.key.equals(key) return entry.value;
84+
entry = entry.next
85+
4. return null
86+
87+
ASCII:
88+
89+
GET "John": hash -> 7 -> index 7
90+
table[7] -> [Entry(hash7, "John") -> ...] => found
91+
92+
REMOVE
93+
94+
------
95+
96+
Synchronized method:
97+
1. compute h, index
98+
2. walk chain keeping previous pointer
99+
3. if found remove by linking prev.next = current.next (or table[index] = current.next if removing head)
100+
4. size--, set removed node fields to null for GC
101+
102+
103+
Rehash (growing the table)
104+
--------------------------
105+
- When size exceeds threshold (based on load factor), Hashtable grows its internal bucket array.
106+
- All entries are re-inserted/rehashed into the new array (index changes because table.length changed).
107+
- Rehash is an expensive global operation (synchronized, blocking other operations during rehash).
108+
109+
110+
111+
Thread-safety model
112+
-------------------
113+
- Hashtable synchronizes most methods (get/put/remove/containsKey etc.) by using synchronized method
114+
declarations. That means a single intrinsic lock (this) serializes access to these operations.
115+
- Pros: simple correctness model; no user-level locking required for single-method atomicity.
116+
- Cons: coarse-grained lock → low concurrency (only one thread can run any synchronized Hashtable method at a time),
117+
can become contention bottleneck under high concurrency.
118+
119+
120+
Iteration and enumeration
121+
-------------------------
122+
- Legacy APIs: keys() and elements() return Enumeration objects (older pre-Collections API).
123+
- Enumeration was introduced early, before Iterator existed. It offers hasMoreElements() / nextElement().
124+
- Modern usage: use entrySet(), keySet(), values() and their iterators.
125+
- Iterators from Hashtable's view collections behave like other collection iterators (attempt to be
126+
fail-fast — detect concurrent modification and throw ConcurrentModificationException).
127+
- Enumeration (legacy) may behave differently historically; prefer modern iterators and synchronize externally
128+
if iterating and mutating concurrently.
129+
130+
131+
Complexity summary
132+
------------------
133+
- get/put/remove: average O(1) if hash codes distribute keys well.
134+
- Worst-case (many collisions in same bucket): O(n) per operation (walk the chain).
135+
- Rehash cost: O(n) when resizing.
136+
- Iteration: O(n) to traverse buckets and chains.
137+
138+
139+
Collision handling
140+
------------------
141+
- Hashtable uses chaining: entries with the same bucket index are linked via a singly-linked list.
142+
- (Note: HashMap in Java 8+ can convert long chains into balanced trees to avoid O(n) worst-case;
143+
legacy Hashtable historically does not treeify — it keeps chaining. Prefer modern maps for treeified behavior.)
144+
145+
146+
Memory / GC aspects
147+
-------------------
148+
- Each Entry node holds references to key, value, next — these must be nulled when removed to allow GC.
149+
- Rehash involves creating a new bucket array and moving references (temporary increased memory pressure).
150+
151+
152+
When to use Hashtable today
153+
---------------------------
154+
- Rarely recommended for new code.
155+
- Use when you need a simple, legacy-style synchronized map and are constrained to pre-Java 1.5 patterns.
156+
- Prefer ConcurrentHashMap for high-concurrency, better scalability and finer-grained locking.
157+
- If you need atomic compound operations (check-then-act), you may still need external synchronization or
158+
use compute/putIfAbsent style methods on concurrent maps.
159+
160+
161+
Differences vs HashMap / ConcurrentHashMap (brief)
162+
--------------------------------------------------
163+
- Hashtable: synchronized methods (coarse-grained lock), legacy.
164+
- HashMap: non-synchronized, faster in single-threaded contexts, modern features (treeification on heavy collisions).
165+
- ConcurrentHashMap: thread-safe with high concurrency, internal partitioning/lock-striping
166+
(or non-blocking CAS internals in modern JDK), preferred for multithreaded workloads.
167+
168+
169+
ASCII flows (examples)
170+
----------------------
171+
172+
1) Simple insert and collision
173+
174+
Start: empty table (length = 8 for example)
175+
Index mapping: h("A") → 2 ; h("B") → 2 (collision)
176+
177+
Before:
178+
table[2] -> null
179+
180+
put("A", 1):
181+
table[2] -> [A:1]
182+
183+
put("B", 2): (collision: same bucket)
184+
table[2] -> [B:2] -> [A:1]
185+
186+
(get order depends on insertion; we may prepend or append based on implementation)
187+
188+
2) get after collisions
189+
get("B") -> index 2 -> check first node: B -> return 2
190+
get("A") -> index 2 -> check first node: B (not match); next node: A -> return 1
191+
192+
3) remove head vs remove middle
193+
remove("B") -> unlink head:
194+
table[2] -> [A:1]
195+
196+
remove("A") -> table[2] -> null
197+
198+
4) rehash
199+
size crosses threshold → allocate new table with larger size → rehash all entries:
200+
old table:
201+
idx2 -> [C] -> [D] -> [E] ...
202+
new length bigger → entries get new indexes according to new length → redistributed.
203+
204+
Pitfalls & interview points
205+
---------------------------
206+
- Hashtable is synchronized — but synchronization is coarse-grained; ConcurrentHashMap is a better choice
207+
when multiple threads need concurrent access without full serialization.
208+
- HashCode quality matters: poor hashCode() causes clustering → degraded performance.
209+
- Beware of relying on insertion/iteration order — Hashtable makes no ordering guarantees.
210+
- Rehash/resizing is expensive; when you know expected size, pre-allocate a sensible initial capacity.
211+
- Legacy API: Enumeration vs modern Iterator — prefer the latter in new code.
212+
213+
Short FAQ
214+
---------
215+
Q: Is Hashtable deprecated?
216+
A: Not deprecated, but considered legacy. Use ConcurrentHashMap or synchronized wrappers of HashMap depending on need.
217+
218+
Q: Are Hashtable iterators fail-fast?
219+
A: Iterators from the collection-view methods follow the standard fail-fast behavior (detect structural modifications).
220+
The legacy Enumeration API behaves differently historically; avoid it for modern code.
221+
222+
Q: Does Hashtable treeify buckets like HashMap (Java 8+)?
223+
A: No — Hashtable typically uses linked-list chaining and does not treeify buckets. HashMap in Java 8 added
224+
treeification to improve worst-case behavior.
225+
226+
Q: How to make iteration safe in a multithreaded context?
227+
A: Either synchronize externally on the Hashtable instance during the whole iteration:
228+
synchronized(table) { for (K k : table.keySet()) { ... } }
229+
or use concurrent collections (ConcurrentHashMap) that provide weakly consistent iterators designed for concurrency.
230+
231+
Practical recommendation
232+
------------------------
233+
- For legacy compatibility or very simple synchronized map needs, Hashtable is acceptable.
234+
- For real concurrent applications use ConcurrentHashMap.
235+
- For single-threaded or externally synchronized contexts use HashMap for performance.
236+
- Always design keys with good hashCode() and stable equals() to avoid correctness and performance problems.

0 commit comments

Comments
 (0)