Skip to content

Commit db9804f

Browse files
committed
docs: compare coarse-grained vs fine-grained mutex locking in pthreads
Added detailed explanation and diagrams comparing two pthread approaches: 1️⃣ Local accumulation with single lock (efficient and real-world) 2️⃣ Locking inside loop (safe but inefficient) Includes practical performance table, conceptual summary, and Java mapping showing how synchronized and atomic updates mirror pthread mutex behavior. Signed-off-by: https://github.com/Someshdiwan <[email protected]>
1 parent ddef260 commit db9804f

File tree

1 file changed

+111
-0
lines changed

1 file changed

+111
-0
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
1️⃣ First Code (Your New Version with localA)
2+
3+
void* threadFunction1(void* args)
4+
{
5+
long int localA = 0;
6+
for (int i = 1; i < 500000; i++) {
7+
localA = localA + i; // local accumulation, no locking
8+
}
9+
pthread_mutex_lock(&a_mutex);
10+
a = a + localA; // one lock/unlock for total
11+
pthread_mutex_unlock(&a_mutex);
12+
}
13+
14+
Key Behavior
15+
• Computation: Each thread performs the entire summation locally inside its own variable localA.
16+
• Synchronization: Mutex is locked only once per thread when adding localA into the global shared variable a.
17+
• Critical section: Very small (just one addition per thread).
18+
• Performance:
19+
• Extremely efficient, minimal mutex contention.
20+
• Threads can run fully in parallel with almost no waiting.
21+
• Correctness: Safe, because a is only written under the mutex.
22+
• Analogy: Like two workers doing their calculations privately, and at the end, each updates the company ledger once.
23+
24+
25+
26+
2️⃣ Second Code (Old Version with Lock Inside the Loop)
27+
28+
void* threadFunction1(void* args)
29+
{
30+
for (int i = 1; i < 500000; i++) {
31+
pthread_mutex_lock(&a_mutex);
32+
a = a + i;
33+
pthread_mutex_unlock(&a_mutex);
34+
}
35+
}
36+
37+
Key Behavior
38+
• Computation: Each iteration adds directly to the global variable a.
39+
• Synchronization: Mutex is locked and unlocked every single iteration (half a million times per thread).
40+
• Critical section: Very frequent, tiny increments, high contention.
41+
42+
• Performance:
43+
• Much slower due to constant locking overhead and context switching.
44+
• Threads spend a lot of time waiting on the mutex instead of doing real work.
45+
• Correctness: Also safe, but inefficient.
46+
• Analogy: Like two workers fighting for the same pen after writing every number — safe, but absurdly inefficient.
47+
48+
49+
50+
3️⃣ Practical Comparison
51+
52+
┌────────────────────────────┬──────────────────────────────────────────────┬────────────────────────────────────────┐
53+
│ Aspect │ First Version (Local Accumulation) │ Second Version (Lock per Iteration) │
54+
├────────────────────────────┼──────────────────────────────────────────────┼────────────────────────────────────────┤
55+
│ Lock frequency │ 1 lock per thread │ 500,000 locks per thread │
56+
│ Thread contention │ Minimal │ Extremely high │
57+
│ Efficiency │ High │ Low │
58+
│ Correctness │ Correct │ Correct │
59+
│ CPU utilization │ More parallel │ More time waiting │
60+
│ Use case │ Preferred real-world pattern │ Only for demonstration of locking │
61+
└────────────────────────────┴──────────────────────────────────────────────┴────────────────────────────────────────┘
62+
63+
64+
65+
4️⃣ Conceptual Summary
66+
67+
Both versions are thread-safe, but:
68+
• The first version minimizes locking, maximizing concurrency.
69+
70+
It uses coarse-grained locking (lock once for whole result update).
71+
• The second version uses fine-grained locking (lock on every iteration).
72+
73+
It’s educational but computationally inefficient.
74+
75+
In multithreading, the goal is always to reduce time spent inside the mutex,
76+
allowing true parallelism — your first version does exactly that.
77+
78+
79+
80+
5️⃣ Relation to Java
81+
82+
If we map this to Java:
83+
84+
┌──────────────────────────────┬──────────────────────────────────────────────┬──────────────────────────────────────────────┐
85+
│ Concept │ C (pthread) │ Java Equivalent │
86+
├──────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┤
87+
│ Local accumulation │ Local variable inside Runnable.run() │ Local variable in thread or lambda │
88+
│ pthread_mutex_lock() │ synchronized(lock) or lock.lock() │ │
89+
│ pthread_mutex_unlock() │ end of synchronized block or lock.unlock() │ │
90+
│ Optimized approach │ Do local work, then one synchronized update │ Same principle: minimize time in synchronized│
91+
└──────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┘
92+
93+
94+
Example Java parallel sum analogy:
95+
96+
AtomicLong a = new AtomicLong(0);
97+
98+
Runnable task = () -> {
99+
long localA = 0;
100+
for (int i = 1; i < 500000; i++) localA += i;
101+
a.addAndGet(localA); // equivalent to one mutex lock
102+
};
103+
104+
That’s semantically identical to your optimized first C version.
105+
106+
107+
108+
So:
109+
✅ Both correct.
110+
🚀 First version — faster, scalable, and real-world efficient.
111+
🐢 Second version — slower, educational, and good for demonstrating mutex contention.

0 commit comments

Comments
 (0)