Merge pull request #891 from ruby-concurrency/tvar

chrisseaton · web-flow · commit 4d5226d3bdd6 · 2022-01-17T23:30:49.000Z
Simplify TVar implementation
diff --git a/docs-source/tvar.md b/docs-source/tvar.md
@@ -24,16 +24,9 @@ We implement nested transactions by flattening.
 We only support strong isolation if you use the API correctly. In order words,
 we do not support strong isolation.
 
-Our implementation uses a very simple two-phased locking with versioned locks
-algorithm and lazy writes, as per [1].
-
-See:
-
-1.  T. Harris, J. Larus, and R. Rajwar. Transactional Memory. Morgan & Claypool, second edition, 2010.
-
-Note that this implementation allows transactions to continue in a zombie state
-with inconsistent reads, so it's possible for the marked exception to be raised
-in the example below.
+Our implementation uses a very simple algorithm that locks each `TVar` when it
+is first read or written. If it cannot lock a `TVar` it aborts and retries.
+There is no contention manager so competing transactions may retry eternally.
 
 ```ruby
 require 'concurrent-ruby'
@@ -216,97 +209,3 @@ big global lock on them, and that if any exception is raised in the block, it
 will be as if the block never happened. But also keep in mind the important
 points we detailed right at the start of the article about side effects and
 repeated execution.
-
-## Evaluation
-
-We evaluated the performance of our `TVar` implementation using a bank account
-simulation with a range of synchronisation implementations. The simulation
-maintains a set of bank account totals, and runs transactions that either get a
-summary statement of multiple accounts (a read-only operation) or transfers a
-sum from one account to another (a read-write operation).
-
-We implemented a bank that does not use any synchronisation (and so creates
-inconsistent totals in accounts), one that uses a single global (or 'coarse')
-lock (and so won't scale at all), one that uses one lock per account (and so has
-a complicated system for locking in the correct order) and one using our `TVar`
-and `atomically`.
-
-We ran 1 million transactions divided equally between a varying number of
-threads on a system that has at least that many physical cores. The transactions
-are made up of a varying mixture of read-only and read-write transactions. We
-ran each set of transactions thirty times, discarding the first ten and then
-taking an algebraic mean. These graphs show only the simple mean. Our `tvars-
-experiments` branch includes the benchmark used, full details of the test
-system, and all the raw data.
-
-Using JRuby using 75% read-write transactions, we can compare how the different
-implementations of bank accounts scales to more cores. That is, how much faster
-it runs if you use more cores.
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/implementation-scalability.png)
-
-We see that the coarse lock implementation does not scale at all, and in fact
-with more cores only wastes more time in contention for the single global lock.
-We see that the unsynchronised implementation doesn't seem to scale well - which
-is strange as there should be no overhead, but we'll explain that in a second.
-We see that the fine lock implementation seems to scale better, and that the
-`TVar` implementation scales the best.
-
-So the `TVar` implementation *scales* very well, but how absolutely fast is it?
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/implementation-absolute.png)
-
-Well, that's the downside. The unsynchronised implementation doesn't scale well
-because it's so fast in the first place, and probably because we're bound on
-access to the memory - the threads don't have much work to do, so no matter how
-many threads we have the system is almost always reaching out to the L3 cache or
-main memory. However remember that the unsynchronised implementation isn't
-correct - the totals are wrong at the end. The coarse lock implementation has an
-overhead of locking and unlocking. The fine lock implementation has a greater
-overhead as as the locking scheme is complicated to avoid deadlock. It scales
-better, however, actually allowing transactions to be processed in parallel. The
-`TVar` implementation has a greater overhead still - and it's pretty huge. That
-overhead is the cost for the simple programming model of an atomic block.
-
-So that's what `TVar` gives you at the moment - great scalability, but it has a
-high overhead. That's pretty much the state of software transactional memory in
-general. Perhaps hardware transactional memory will help us, or perhaps we're
-happy anyway with the simpler and safer programming model that the `TVar` gives
-us.
-
-We can also use this experiment to compare different implementations of Ruby. We
-looked at just the `TVar` implementation and compared MRI 2.1.1, Rubinius 2.2.6,
-and JRuby 1.7.11, again at 75% write transactions.
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/ruby-scalability.png)
-
-We see that MRI provides no scalability, due to the global interpreter lock
-(GIL). JRuby seems to scale better than Rubinius for this workload (there are of
-course other workloads).
-
-As before we should also look at the absolute performance, not just the
-scalability.
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/ruby-absolute.png)
-
-Again, JRuby seems to be faster than Rubinius for this experiment.
-Interestingly, Rubinius looks slower than MRI for 1 core, but we can get around
-that by using more cores.
-
-We've used 75% read-write transactions throughout. We'll just take a quick look
-at how the scalability varies for different workloads, for scaling between 1 and
-2 threads. We'll admit that we used 75% read-write just because it emphasised
-the differences.
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/implementation-write-proportion-scalability.png)
-
-Finally, we can also run on a larger machine. We repeated the experiment using a
-machine with 64 physical cores and JRuby.
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/implementation-scalability.png)
-
-![](https://raw.githubusercontent.com/ruby-concurrency/concurrent-ruby/master/doc/images/tvar/implementation-absolute.png)
-
-Here you can see that `TVar` does become absolutely faster than using a global
-lock, at the slightly ridiculously thread-count of 50. It's probably not
-statistically significant anyway.
diff --git a/lib/concurrent-ruby/concurrent/tvar.rb b/lib/concurrent-ruby/concurrent/tvar.rb
@@ -15,7 +15,6 @@ class TVar < Synchronization::Object
     # Create a new `TVar` with an initial value.
     def initialize(value)
       @value = value
-      @version = 0
       @lock = Mutex.new
     end
 
@@ -43,16 +42,6 @@ def unsafe_value=(value) # :nodoc:
       @value = value
     end
 
-    # @!visibility private
-    def unsafe_version # :nodoc:
-      @version
-    end
-
-    # @!visibility private
-    def unsafe_increment_version # :nodoc:
-      @version += 1
-    end
-
     # @!visibility private
     def unsafe_lock # :nodoc:
       @lock
@@ -164,86 +153,57 @@ class Transaction
 
     ABORTED = ::Object.new
 
-    ReadLogEntry = Struct.new(:tvar, :version)
+    OpenEntry = Struct.new(:value, :modified)
 
     AbortError = Class.new(StandardError)
     LeaveError = Class.new(StandardError)
 
     def initialize
-      @read_log  = []
-      @write_log = {}
+      @open_tvars = {}
     end
 
     def read(tvar)
-      Concurrent::abort_transaction unless valid?
-
-      if @write_log.has_key? tvar
-        @write_log[tvar]
-      else
-        @read_log.push(ReadLogEntry.new(tvar, tvar.unsafe_version))
-        tvar.unsafe_value
-      end
+      entry = open(tvar)
+      entry.value
     end
 
     def write(tvar, value)
-      # Have we already written to this TVar?
+      entry = open(tvar)
+      entry.modified = true
+      entry.value = value
+    end
 
-      if @write_log.has_key? tvar
-        # Record the value written
-        @write_log[tvar] = value
-      else
-        # Try to lock the TVar
+    def open(tvar)
+      entry = @open_tvars[tvar]
 
+      unless entry
         unless tvar.unsafe_lock.try_lock
-          # Someone else is writing to this TVar - abort
           Concurrent::abort_transaction
         end
 
-        # Record the value written
-  
-        @write_log[tvar] = value
-
-        # If we previously read from it, check the version hasn't changed
-
-        @read_log.each do |log_entry|
-          if log_entry.tvar == tvar and tvar.unsafe_version > log_entry.version
-            Concurrent::abort_transaction
-          end
-        end
+        entry = OpenEntry.new(tvar.unsafe_value, false)
+        @open_tvars[tvar] = entry
       end
+
+      entry
     end
 
     def abort
       unlock
     end
 
     def commit
-      return false unless valid?
-
-      @write_log.each_pair do |tvar, value|
-        tvar.unsafe_value = value
-        tvar.unsafe_increment_version
-      end
-
-      unlock
-
-      true
-    end
-
-    def valid?
-      @read_log.each do |log_entry|
-        unless @write_log.has_key? log_entry.tvar
-          if log_entry.tvar.unsafe_version > log_entry.version
-            return false
-          end
+      @open_tvars.each do |tvar, entry|
+        if entry.modified
+          tvar.unsafe_value = entry.value
         end
       end
 
-      true
+      unlock
     end
 
     def unlock
-      @write_log.each_key do |tvar|
+      @open_tvars.each_key do |tvar|
         tvar.unsafe_lock.unlock
       end
     end
diff --git a/spec/concurrent/tvar_spec.rb b/spec/concurrent/tvar_spec.rb
@@ -106,48 +106,6 @@ module Concurrent
       expect(t2.value).to eq 0
     end
 
-    it 'provides weak isolation' do
-      t = TVar.new(0)
-
-      a = CountDownLatch.new
-      b = CountDownLatch.new
-
-      in_thread do
-        Concurrent::atomically do
-          t.value = 1
-          a.count_down
-          b.wait
-          Concurrent.leave_transaction
-        end
-      end
-
-      Concurrent::atomically do
-        a.wait
-        expect(t.value).to eq 0
-        b.count_down
-        Concurrent.leave_transaction
-      end
-    end
-
-    it 'is implemented with lazy writes' do
-      t = TVar.new(0)
-
-      a = CountDownLatch.new
-      b = CountDownLatch.new
-
-      in_thread do
-        Concurrent::atomically do
-          t.value = 1
-          a.count_down
-          b.wait
-        end
-      end
-
-      a.wait
-      expect(t.value).to eq 0
-      b.count_down
-    end
-
     it 'nests' do
       t = TVar.new(0)
 
diff --git a/test/stress/tvar/atomicity.rb b/test/stress/tvar/atomicity.rb
@@ -0,0 +1,31 @@
+require 'concurrent-ruby'
+
+v1 = Concurrent::TVar.new(0)
+v2 = Concurrent::TVar.new(0)
+
+Thread.new do
+  loop do
+    Concurrent.atomically do
+      v1.value += 1
+      v2.value += 1
+    end
+  end
+end
+
+loop do
+  a, b = Concurrent.atomically {
+    a = v1.value
+    b = v2.value
+    [a, b]
+  }
+  raise if a != b
+  p a
+
+  a, b = Concurrent.atomically {
+    b = v2.value
+    a = v1.value
+    [a, b]
+  }
+  raise if a != b
+  p a
+end
diff --git a/test/stress/tvar/opacity.rb b/test/stress/tvar/opacity.rb
@@ -0,0 +1,31 @@
+require 'concurrent-ruby'
+
+v1 = Concurrent::TVar.new(0)
+v2 = Concurrent::TVar.new(0)
+
+Thread.new do
+  loop do
+    Concurrent.atomically do
+      v1.value += 1
+      v2.value += 1
+    end
+  end
+end
+
+loop do
+  a, b = Concurrent.atomically {
+    a = v1.value
+    b = v2.value
+    raise if a != b
+    [a, b]
+  }
+  p a
+
+  a, b = Concurrent.atomically {
+    b = v2.value
+    a = v1.value
+    raise if a != b
+    [a, b]
+  }
+  p a
+end