Skip to content

Commit e3cfcf1

Browse files
committed
Add doc on how VALUEs are converted to and from native handles.
1 parent 7ef9fbb commit e3cfcf1

File tree

2 files changed

+112
-11
lines changed

2 files changed

+112
-11
lines changed

doc/contributor/cext-values.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# `VALUE`s in C extensions
2+
3+
## Semantics on MRI
4+
5+
Before we discuss the mechanisms used to represent MRI's `VALUE`
6+
semantics we should outline what those are. A `VALUE`in a local
7+
variable (i.e. on the stack) will keep the associated object alive as
8+
long as that stack entry lasts (so either until the function exits, or
9+
until that variable is no longer live). We can also wrap C structures
10+
in Ruby objects, and when we do this we're able to specify a marking
11+
function. This marking function is used by MRI's garbage collector to
12+
find all the objects reachable from the structure, and allows it to
13+
mark them in the same way it would with normal instance
14+
variables. There are also a couple of utility methods and macros for
15+
keeping a value alive for the duration of a function call even if it
16+
is no longer being held in a variable, and for globally preserving a
17+
value held in a static variable.
18+
19+
Because `VALUE`s are essentially tagged pointers on MRI there are also
20+
some semantics that may be obvious but are worth stating anyway:
21+
22+
* Any two `VALUE`s associated with the same object will be
23+
identical. In other words as long as an object is alive its `VALUE`
24+
will remain constant.
25+
* A `VALUE` for a live object can reuse the same tagged pointer that
26+
was previously used for a now dead object.
27+
28+
## Emulating the semantics in TruffleRuby
29+
30+
Emulating these semantics on TruffleRuby is non-trivial. Although we
31+
are running under a garbage collector it doesn't know that a `VALUE`
32+
maps to an object, and neither does it have any mechanism for
33+
specifying a custom mark function to be used with particular
34+
objects. As long as `VALUE`s can remain as `ValueWrapper` objects then
35+
we don't need to do much. Ruby objects maintain a strong reference to
36+
their associated `ValueWrapper`, and vice versa, so we only really
37+
need to consider situations where `VALUE`s are converted into native
38+
handles.
39+
40+
### Keeping objects alive on the stack
41+
42+
We implement an `ExtensionCallStack` object to keep track of various
43+
bits of useful information during a call to a C extension. Each stack
44+
entry contains a `preservedOject`, and an additional potential
45+
`preservedObject` list which together will contain all the
46+
`ValueWrapper`s converted to native handles during the process of a
47+
call. When a new call is made a new `ExtensionCallStackEntry` is added
48+
to the stack, and when the call exits that entry is popped off again.
49+
50+
### Keeping objects alive in structures
51+
52+
We don't have a way to run markers when doing garbage collection, but
53+
we know we're keeping objects alive during the lifetime or a C call,
54+
and we can record when the structure is accessed (which should be
55+
required for the internal state of that structure to be mutated). To
56+
do this we keep a list of objects to be marked in a similar manner to
57+
the objects that should be kept alive, and when we exit the C call
58+
we'll call those markers.
59+
60+
### Running mark functions
61+
62+
We run markers by recording the object being marked on the extension
63+
stack, and then calling the marker which will in turn call
64+
`rb_gc_mark` for the individual `VALUE`s which are held by the
65+
structure. We'll record those marked objects in a temporary array also
66+
held on the extension stack, and then attach that to the object
67+
wrapping the struct when the mark function has finished.
68+
69+
70+
## Managing the conversion of `VALUE`s to and from native handles
71+
72+
When converted to native, the `ValueWrapper` takes the following long values.
73+
74+
| Represented Value | Handle Bits | Comments |
75+
|-------------------|-------------------------------------|----------|
76+
| false | 00000000 00000000 00000000 00000000 | |
77+
| true | 00000000 00000000 00000000 00000010 | |
78+
| nil | 00000000 00000000 00000000 00000100 | |
79+
| undefined | 00000000 00000000 00000000 00000110 | |
80+
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
81+
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |
82+
83+
The built in objects, `true`, `false`, `nil`, and `undefined` are
84+
handled specially, and integers are relatively easy because there is a
85+
well defined mapping from the native representation to the integer and
86+
vice varsa, but to manage objects we need to do a little more work.
87+
88+
When we convert an object `VALUE` to its native representation we need
89+
to keep the corresponding `ValueWrapper` object alive, and we need to
90+
record that mapping from handle to `ValueWrapper` somewhere. The
91+
mapping from `ValueWrapper` to handle must also be stable, so a symbol
92+
or other immutable object that can outlive a context will need to
93+
store that mapping somewhere on the `RubyLangage` object.
94+
95+
We achieve all this through a combination of handle block maps and
96+
allocators. We deal with handles in blocks of 4096, and the current
97+
`RubyFiber` holds onto a `HandleBlockHolder` which in turn holds the
98+
current block for mutable objects (which cannot outlive the
99+
`RubyContext`) and immutable objects (which can outlive the
100+
context). Each fiber will take values from those blocks until they
101+
becomes exhausted. When that block is exhausted then `RubyLanauge`
102+
holds a `HandleBlockAllocator` which is responsible for allocating new
103+
blocks and recycling old ones. These blocks of handles however only
104+
hold weak references, because we don't want a conversion to native to
105+
keep the `ValueWrapper` alive longer that it should.
106+
107+
Conversely the `HandleBlock` _must_ live for as long as there are any
108+
reachable `ValueWrapper`s in that block, so a `ValueWrapper` keeps a
109+
strong reference to the `HandleBlock` it is in.

doc/contributor/cexts.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -125,18 +125,10 @@ not a `VALUE`.
125125

126126
See [polyglot.h](https://github.com/oracle/graal/blob/master/sulong/projects/com.oracle.truffle.llvm.libraries.graalvm.llvm/include/graalvm/llvm/polyglot.h) for documentation regarding the `polyglot_*` methods.
127127

128+
##### Native conversion
128129

129-
##### ValueWrapper Long Representation
130-
When converted to native, the `ValueWrapper` takes the following long values.
131-
132-
| Represented Value | Handle Bits | Comments |
133-
|-------------------|-------------------------------------|----------|
134-
| false | 00000000 00000000 00000000 00000000 | |
135-
| true | 00000000 00000000 00000000 00000010 | |
136-
| nil | 00000000 00000000 00000000 00000100 | |
137-
| undefined | 00000000 00000000 00000000 00000110 | |
138-
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
139-
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |
130+
See [cext-values.md](cext-values.md) for documentation of the
131+
conversion and management of native handles.
140132

141133
### String pointers
142134

0 commit comments

Comments
 (0)