Skip to content

Commit 57b6a75

Browse files
committed
Lock-free hash set for fstrings [Feature #21268]
This implements a hash set which is wait-free for lookup and lock-free for insert (unless resizing) to use for fstring de-duplication. As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of fstrings (frozen interned strings) can significantly reduce the parallelism of Ractors. I tried a few other approaches first: using an RWLock, striping a series of RWlocks (partitioning the hash N-ways to reduce lock contention), and putting a cache in front of it. All of these improved the situation, but were unsatisfying as all still required locks for writes (and granular locks are awkward, since we run the risk of needing to reach a vm barrier) and this table is somewhat write-heavy. My main reference for this was Cliff Click's talk on a lock free hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It turns out this lock-free hash set is made easier to implement by a few properties: * We only need a hash set rather than a hash table (we only need keys, not values), and so the full entry can be written as a single VALUE * As a set we only need lookup/insert/delete, no update * Delete is only run inside GC so does not need to be atomic (It could be made concurrent) * I use rb_vm_barrier for the (rare) table rebuilds (It could be made concurrent) We VM lock (but don't require other threads to stop) for table rebuilds, as those are rare * The conservative garbage collector makes deferred replication easy, using a T_DATA object Another benefits of having a table specific to fstrings is that we compare by value on lookup/insert, but by identity on delete, as we only want to remove the exact string which is being freed. This is faster and provides a second way to avoid the race condition in https://bugs.ruby-lang.org/issues/21172. This is a pretty standard open-addressing hash table with quadratic probing. Similar to our existing st_table or id_table. Deletes (which happen on GC) replace existing keys with a tombstone, which is the only type of update which can occur. Tombstones are only cleared out on resize. Unlike st_table, the VALUEs are stored in the hash table itself (st_table's bins) rather than as a compact index. This avoids an extra pointer dereference and is possible because we don't need to preserve insertion order. The table targets a load factor of 2 (it is enlarged once it is half full).
1 parent b28363a commit 57b6a75

File tree

8 files changed

+437
-123
lines changed

8 files changed

+437
-123
lines changed

bootstraptest/test_ractor.rb

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1580,6 +1580,21 @@ class C
15801580
}.map{|r| r.take}.join
15811581
}
15821582

1583+
assert_equal "ok", %Q{
1584+
N = #{N}
1585+
a, b = 2.times.map{
1586+
Ractor.new{
1587+
N.times.map{|i| -(i.to_s)}
1588+
}
1589+
}.map{|r| r.take}
1590+
N.times do |i|
1591+
unless a[i].equal?(b[i])
1592+
raise [a[i], b[i]].inspect
1593+
end
1594+
end
1595+
:ok
1596+
}
1597+
15831598
# Generic ivtbl
15841599
n = N/2
15851600
assert_equal "#{n}#{n}", %Q{

eval.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ ruby_setup(void)
7979
Init_BareVM();
8080
rb_vm_encoded_insn_data_table_init();
8181
Init_vm_objects();
82+
Init_fstring_table();
8283

8384
EC_PUSH_TAG(GET_EC());
8485
if ((state = EC_EXEC_TAG()) == TAG_NONE) {

gc.c

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,7 @@ rb_gc_shutdown_call_finalizer_p(VALUE obj)
341341
if (rb_obj_is_mutex(obj)) return false;
342342
if (rb_obj_is_fiber(obj)) return false;
343343
if (rb_obj_is_main_ractor(obj)) return false;
344+
if (rb_obj_is_fstring_table(obj)) return false;
344345

345346
return true;
346347

@@ -3528,6 +3529,7 @@ vm_weak_table_frozen_strings_foreach(st_data_t key, st_data_t value, st_data_t d
35283529
return retval;
35293530
}
35303531

3532+
void rb_fstring_foreach_with_replace(st_foreach_check_callback_func *func, st_update_callback_func *replace, st_data_t arg);
35313533
void
35323534
rb_gc_vm_weak_table_foreach(vm_table_foreach_callback_func callback,
35333535
vm_table_update_callback_func update_callback,
@@ -3590,14 +3592,11 @@ rb_gc_vm_weak_table_foreach(vm_table_foreach_callback_func callback,
35903592
break;
35913593
}
35923594
case RB_GC_VM_FROZEN_STRINGS_TABLE: {
3593-
if (vm->frozen_strings) {
3594-
st_foreach_with_replace(
3595-
vm->frozen_strings,
3596-
vm_weak_table_frozen_strings_foreach,
3597-
vm_weak_table_foreach_update_weak_key,
3598-
(st_data_t)&foreach_data
3599-
);
3600-
}
3595+
rb_fstring_foreach_with_replace(
3596+
vm_weak_table_frozen_strings_foreach,
3597+
vm_weak_table_foreach_update_weak_key,
3598+
(st_data_t)&foreach_data
3599+
);
36013600
break;
36023601
}
36033602
default:

internal/string.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ RUBY_SYMBOL_EXPORT_END
8484

8585
VALUE rb_fstring_new(const char *ptr, long len);
8686
void rb_gc_free_fstring(VALUE obj);
87+
bool rb_obj_is_fstring_table(VALUE obj);
88+
void Init_fstring_table();
8789
VALUE rb_obj_as_string_result(VALUE str, VALUE obj);
8890
VALUE rb_str_opt_plus(VALUE x, VALUE y);
8991
VALUE rb_str_concat_literals(size_t num, const VALUE *strary);

internal/vm.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ void rb_vm_check_redefinition_by_prepend(VALUE klass);
5656
int rb_vm_check_optimizable_mid(VALUE mid);
5757
VALUE rb_yield_refine_block(VALUE refinement, VALUE refinements);
5858
VALUE ruby_vm_special_exception_copy(VALUE);
59-
PUREFUNC(st_table *rb_vm_fstring_table(void));
6059

6160
void rb_lastline_set_up(VALUE val, unsigned int up);
6261

0 commit comments

Comments
 (0)