docs: compare coarse-grained vs fine-grained mutex locking in pthreads

Someshdiwan · Someshdiwan · commit db9804fb03cb · 2025-10-24T01:45:21.000+05:30
Added detailed explanation and diagrams comparing two pthread approaches: 1️⃣ Local accumulation with single lock (efficient and real-world) 2️⃣ Locking inside loop (safe but inefficient) Includes practical performance table, conceptual summary, and Java mapping showing how synchronized and atomic updates mirror pthread mutex behavior. Signed-off-by: https://github.com/Someshdiwan <someshdiwan369@gmail.com>
diff --git a/Section19MultiThreading/Threading Inside Look/MutexDemo Vs OptimizedMutexDemo.txt b/Section19MultiThreading/Threading Inside Look/MutexDemo Vs OptimizedMutexDemo.txt
@@ -0,0 +1,111 @@
+1️⃣ First Code (Your New Version with localA)
+
+void* threadFunction1(void* args)
+{
+    long int localA = 0;
+    for (int i = 1; i < 500000; i++) {
+        localA = localA + i;    // local accumulation, no locking
+    }
+    pthread_mutex_lock(&a_mutex);
+    a = a + localA;             // one lock/unlock for total
+    pthread_mutex_unlock(&a_mutex);
+}
+
+Key Behavior
+•	Computation: Each thread performs the entire summation locally inside its own variable localA.
+•	Synchronization: Mutex is locked only once per thread when adding localA into the global shared variable a.
+•	Critical section: Very small (just one addition per thread).
+•	Performance:
+•	Extremely efficient, minimal mutex contention.
+•	Threads can run fully in parallel with almost no waiting.
+•	Correctness: Safe, because a is only written under the mutex.
+•	Analogy: Like two workers doing their calculations privately, and at the end, each updates the company ledger once.
+
+⸻
+
+2️⃣ Second Code (Old Version with Lock Inside the Loop)
+
+void* threadFunction1(void* args)
+{
+    for (int i = 1; i < 500000; i++) {
+        pthread_mutex_lock(&a_mutex);
+        a = a + i;
+        pthread_mutex_unlock(&a_mutex);
+    }
+}
+
+Key Behavior
+•	Computation: Each iteration adds directly to the global variable a.
+•	Synchronization: Mutex is locked and unlocked every single iteration (half a million times per thread).
+•	Critical section: Very frequent, tiny increments, high contention.
+
+•	Performance:
+•	Much slower due to constant locking overhead and context switching.
+•	Threads spend a lot of time waiting on the mutex instead of doing real work.
+•	Correctness: Also safe, but inefficient.
+•	Analogy: Like two workers fighting for the same pen after writing every number — safe, but absurdly inefficient.
+
+⸻
+
+3️⃣ Practical Comparison
+
+┌────────────────────────────┬──────────────────────────────────────────────┬────────────────────────────────────────┐
+│ Aspect                     │ First Version (Local Accumulation)           │ Second Version (Lock per Iteration)    │
+├────────────────────────────┼──────────────────────────────────────────────┼────────────────────────────────────────┤
+│ Lock frequency             │ 1 lock per thread                            │ 500,000 locks per thread               │
+│ Thread contention          │ Minimal                                      │ Extremely high                         │
+│ Efficiency                 │ High                                         │ Low                                    │
+│ Correctness                │ Correct                                      │ Correct                                │
+│ CPU utilization            │ More parallel                                │ More time waiting                      │
+│ Use case                   │ Preferred real-world pattern                 │ Only for demonstration of locking      │
+└────────────────────────────┴──────────────────────────────────────────────┴────────────────────────────────────────┘
+
+⸻
+
+4️⃣ Conceptual Summary
+
+Both versions are thread-safe, but:
+•	The first version minimizes locking, maximizing concurrency.
+
+It uses coarse-grained locking (lock once for whole result update).
+•	The second version uses fine-grained locking (lock on every iteration).
+
+It’s educational but computationally inefficient.
+
+In multithreading, the goal is always to reduce time spent inside the mutex,
+allowing true parallelism — your first version does exactly that.
+
+⸻
+
+5️⃣ Relation to Java
+
+If we map this to Java:
+
+┌──────────────────────────────┬──────────────────────────────────────────────┬──────────────────────────────────────────────┐
+│ Concept                      │ C (pthread)                                  │ Java Equivalent                              │
+├──────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┤
+│ Local accumulation           │ Local variable inside Runnable.run()         │ Local variable in thread or lambda           │
+│ pthread_mutex_lock()         │ synchronized(lock) or lock.lock()            │                                              │
+│ pthread_mutex_unlock()       │ end of synchronized block or lock.unlock()   │                                              │
+│ Optimized approach           │ Do local work, then one synchronized update  │ Same principle: minimize time in synchronized│
+└──────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┘
+
+
+Example Java parallel sum analogy:
+
+AtomicLong a = new AtomicLong(0);
+
+Runnable task = () -> {
+    long localA = 0;
+    for (int i = 1; i < 500000; i++) localA += i;
+    a.addAndGet(localA); // equivalent to one mutex lock
+};
+
+That’s semantically identical to your optimized first C version.
+
+⸻
+
+So:
+✅ Both correct.
+🚀 First version — faster, scalable, and real-world efficient.
+🐢 Second version — slower, educational, and good for demonstrating mutex contention.