|
| 1 | +# Pointer arithmetic and value holders |
| 2 | + |
| 3 | +One of the reasons for the power of C is that, despite the appearance of having a type system, it effectively lacks one: every value is ultimately reduced to a memory position. |
| 4 | + |
| 5 | +An integer value, for example `x = 42`, is nothing more than a sequence of bytes stored somewhere in memory. The exact number of bytes depends on the platform and the C type, but in memory it will look something like this: |
| 6 | +`x = | 2A | 00 | 00 | 00 |` |
| 7 | +(depending on the encoding, but that is another story). |
| 8 | + |
| 9 | +What matters here is that `x` exists at a **position in memory**, inside a chunk of memory assigned by the program, and the C compiler is able to manipulate it by performing operations on that memory location. |
| 10 | + |
| 11 | +C exposes this fact explicitly through a fundamental operator: `&`. |
| 12 | +The `&` operator gives you the **address of** a variable --- that is, the position in memory where the variable is stored. Instead of giving you the value stored at that position, it gives you the position itself. |
| 13 | + |
| 14 | +This is fundamental in C for constructing arrays and other data structures, **but it is also fundamental for passing data between functions**. If I can obtain the address of a variable, I can pass that address to another function, allowing it to modify the original variable without having direct access to it. |
| 15 | + |
| 16 | +For example: |
| 17 | + |
| 18 | +```language=c |
| 19 | +void store_value(int *value) { |
| 20 | + *value = 42; |
| 21 | +} |
| 22 | +
|
| 23 | +void main() { |
| 24 | + int x; |
| 25 | + store_value(&x); |
| 26 | + printf("x = %d\n", x); |
| 27 | + /* It will print "x = 42" */ } |
| 28 | +``` |
| 29 | + |
| 30 | +Here, `store_value` receives not the value of `x`, but its address. By dereferencing that address, the function modifies the memory where `x` is stored. |
| 31 | + |
| 32 | +## What does this mean in uFFI? |
| 33 | + |
| 34 | +Historically, this has been handled in uFFI by passing a `ByteArray` to the C function. A `ByteArray` is essentially a reification of a chunk of memory, which allows C to operate on it directly. |
| 35 | + |
| 36 | +Using the low-level mechanisms provided by uFFI, this looks like: |
| 37 | + |
| 38 | +```language=smalltalk |
| 39 | +| buffer x | |
| 40 | +
|
| 41 | +buffer := ByteArray new: 4. |
| 42 | +self store_value: buffer. |
| 43 | +"store_value defined as: <ffiCall: #(void store_value(int *x))>" |
| 44 | +x := buffer signedLongAt: 1. |
| 45 | +``` |
| 46 | + |
| 47 | +In short, we pass a `ByteArray` to a C function that expects a pointer to an integer, then extract the value from that memory location using a primitive. |
| 48 | + |
| 49 | +**This is straightforward, but it is low-level and error-prone.** |
| 50 | +It requires detailed knowledge of memory layout, sizes, and access primitives. |
| 51 | + |
| 52 | +## ... enter value holders |
| 53 | + |
| 54 | +To simplify this complexity and make code easier to write and understand, we introduce **value holders**. |
| 55 | + |
| 56 | +Value holders are not magic. They are simply a more expressive, *Pharoish* way of representing the same low-level mechanism: "a place in memory where C will store a value". |
| 57 | + |
| 58 | +Using value holders, the same example becomes: |
| 59 | + |
| 60 | +```language=smalltalk |
| 61 | +| xHolder x | |
| 62 | +xHolder := FFIInt32 newValueHolder. |
| 63 | +self store_value: xHolder. |
| 64 | +x := xHolder value. |
| 65 | +``` |
| 66 | +This is still more verbose than the equivalent C code, but the *meaning* of what is happening is much clearer. Let's break it down. |
| 67 | + |
| 68 | +##### 1. Value holder creation |
| 69 | + |
| 70 | +```language=smalltalk |
| 71 | +xHolder := FFIInt32 newValueHolder. |
| 72 | +``` |
| 73 | + |
| 74 | +Here we create a container --- a place where an `int32` value will be stored by C. |
| 75 | + |
| 76 | +Yes, this requires knowing the C type and its corresponding uFFI type. But if you are calling a C function, we assume you know what you are doing 😜. |
| 77 | + |
| 78 | +##### 2. Call the C function |
| 79 | + |
| 80 | +```language=smalltalk |
| 81 | +self store_value: xHolder. |
| 82 | +``` |
| 83 | + |
| 84 | +The C function is called exactly as before. The only difference is that we pass a value holder instead of a raw memory buffer. No changes to the function declaration are required. |
| 85 | + |
| 86 | +##### 3. Retrieve the value |
| 87 | + |
| 88 | +```language=smalltalk |
| 89 | +x := xHolder value. |
| 90 | +``` |
| 91 | + |
| 92 | +You retrieve the value by simply asking the holder for it. There is no need to remember how to read an `int32` from a pointer or a byte array using low-level primitives. |
| 93 | + |
| 94 | +This mechanism works with all basic C types defined in uFFI, including: |
| 95 | + |
| 96 | +`FFIBool`, `FFIExternalString`, `FFIOop`, `FFIBoolean32`, `FFIFloat128`, `FFIFloat16`, `FFIFloat32`, `FFIFloat64`, `FFISizeT`, |
| 97 | +`FFIUInt8`, `FFIUInt16`, `FFIUInt32`, `FFIUInt64`, |
| 98 | +`FFIInt8`, `FFIInt16`, `FFIInt32`, `FFIInt64`, `FFILong`, `FFIULong`. |
| 99 | + |
| 100 | +## What happens with structures, unions, and external objects? |
| 101 | + |
| 102 | +Value holders work naturally for basic C types, but what about more complex ones? |
| 103 | + |
| 104 | +### Structures (and unions) |
| 105 | + |
| 106 | +When you pass a structure *by value* in C, it is always copied. This means the function receives a copy of the structure's contents and can only read them. |
| 107 | + |
| 108 | +For example: |
| 109 | + |
| 110 | +```language=c |
| 111 | +typedef struct MyStruct { |
| 112 | + int value1; |
| 113 | + int value2; |
| 114 | +} mystructtype; |
| 115 | +
|
| 116 | +int sum(mystructtype t) { |
| 117 | + return t.value1 + t.value2; |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +```language=smalltalk |
| 122 | +v := MyStruct new. |
| 123 | +v |
| 124 | + value1: 10; |
| 125 | + value2: 10. |
| 126 | +result := aFFILibrary sum: v. |
| 127 | +``` |
| 128 | + |
| 129 | +This works correctly. |
| 130 | + |
| 131 | +However, if you need the C function to **modify** the structure and observe those changes later, you must pass a *reference* --- that is, the address of the structure. |
| 132 | + |
| 133 | +For example: |
| 134 | + |
| 135 | +```language=c |
| 136 | +typedef struct MyStruct { |
| 137 | + int value1; |
| 138 | + int value2; |
| 139 | +} mystructtype; |
| 140 | +
|
| 141 | +void fill_values(mystructtype *t) { |
| 142 | + t->value1 = 10; |
| 143 | + t->value2 = 10; |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +This example is intentionally simplified and not realistic, but it captures the essence: modifying a structure through a pointer. |
| 148 | + |
| 149 | +### Passing structure value holders |
| 150 | + |
| 151 | +Using value holders, this is straightforward: |
| 152 | + |
| 153 | +```language=smalltalk |
| 154 | +structHolder := MyStruct newValueHolder. |
| 155 | +aFFILibrary fill_values: structHolder. |
| 156 | +v := structHolder value. |
| 157 | +Transcript show: ('{1} + {2} = {3}' format: { |
| 158 | + v value1. |
| 159 | + v value2. |
| 160 | + v value1 + v value2 }) |
| 161 | +``` |
| 162 | + |
| 163 | +This ensures uniform access to structures and unions, just like any other type. |
| 164 | + |
| 165 | +### Passing a reference to a structure |
| 166 | + |
| 167 | +Sometimes you already have a structure instance and need to pass it by reference. This is common when a structure must be initialized first and then modified by a C function. |
| 168 | + |
| 169 | +For this case, structures and unions provide the `referenceTo` message: |
| 170 | + |
| 171 | +```language=smalltalk |
| 172 | +v := MyStruct new. |
| 173 | +aFFILibrary fill_values: v referenceTo. |
| 174 | +Transcript show: ('{1} + {2} = {3}' format: { |
| 175 | + v value1. |
| 176 | + v value2. |
| 177 | + v value1 + v value2 }) |
| 178 | +``` |
| 179 | + |
| 180 | +**Note:** |
| 181 | +Before this mechanism existed, uFFI relied on mangling magic for single indirection (pointer depth = 1). Because structures are internally stored in byte arrays, passing the structure itself also worked as a reference. |
| 182 | + |
| 183 | +This behavior is subtle and relies on internal implementation details. Now that an explicit mechanism exists, **we do not recommend relying on that behavior**. |
| 184 | +## Multiple pointer indirection (pointer depth \> 1) |
| 185 | + |
| 186 | +In C, it is common --- especially when dealing with lists --- to encounter arguments with more than one level of indirection. |
| 187 | +Of this cases, we will focus on arrays, since other cases follow the same pattern. |
| 188 | +### Passing arrays |
| 189 | +In C, arrays are just pointer arithmetic. A function declared with `int *` or `char **` often expects a list of values. |
| 190 | + |
| 191 | +uFFI provides an abstraction for this through the `FFIArray` class. `FFIArray` can be used both to define array types and to create instances that manage storing and retrieving data through C pointers. |
| 192 | + |
| 193 | +#### Sending arrays as part of a callout |
| 194 | + |
| 195 | +Consider this C function: |
| 196 | + |
| 197 | +```language=c |
| 198 | +int sum_list(const int *list, size_t size) { |
| 199 | + int sum = 0; |
| 200 | + for (size_t i = 0; i < size; i++) { |
| 201 | + sum += list[i]; |
| 202 | + } |
| 203 | + return sum; |
| 204 | +} |
| 205 | +``` |
| 206 | + |
| 207 | +The corresponding Pharo binding: |
| 208 | + |
| 209 | +```language=smalltalk |
| 210 | +sumList: list size: size |
| 211 | + ^ self ffiCall: #(int sum_list(const int *list, size_t size)) |
| 212 | +``` |
| 213 | + |
| 214 | +Usage: |
| 215 | + |
| 216 | +```language=smalltalk |
| 217 | +arrayOfIntegers := FFIArray newType: FFIInt32 size: 5. 1 to: 5 do: [ :i | arrayOfIntegers at: i put: i factorial ]. result := self sumList: arrayOfIntegers size: 5. |
| 218 | +``` |
| 219 | + |
| 220 | +#### Retrieving arrays |
| 221 | + |
| 222 | +Retrieving arrays is more complex, because memory allocation is often done by the C function itself. |
| 223 | + |
| 224 | +##### Case 1: list of structures with known size |
| 225 | + |
| 226 | +```language=c |
| 227 | +void collect_times(time_t *times, int samples) { |
| 228 | + time_t *t = (time_t *) |
| 229 | + malloc(sizeof(time_t) * samples); |
| 230 | + for (i = 0; i < samples; i++) { |
| 231 | + t[i] = time(); |
| 232 | + } |
| 233 | + *times = *t; |
| 234 | +} |
| 235 | +``` |
| 236 | + |
| 237 | +Since the size is known: |
| 238 | + |
| 239 | +```language=smalltalk |
| 240 | +times := FFIArray newType: TimeT size: 3. |
| 241 | +aFFILibrary collect_times: times samples: 3. |
| 242 | +time1 := times at: 1. |
| 243 | +time2 := times at: 2. |
| 244 | +time3 := times at: 3. |
| 245 | +``` |
| 246 | + |
| 247 | +##### Case 2: list of structures with unknown size |
| 248 | + |
| 249 | +```language=c |
| 250 | +int collect_times(time_t **times) { |
| 251 | + int samples = 3; |
| 252 | + time_t *t = malloc(sizeof(time_t) * samples); |
| 253 | + for (i = 0; i < samples; i++) { |
| 254 | + t[i] = time(); |
| 255 | + } |
| 256 | + *times = t; |
| 257 | + return samples; |
| 258 | +} |
| 259 | +``` |
| 260 | + |
| 261 | +Here the C function allocates the memory and returns the size: |
| 262 | + |
| 263 | +```language=smalltalk |
| 264 | +timesHolder := FFIOop newValueHolder. |
| 265 | +samples := aFFILibrary collect_times: timesHolder. |
| 266 | +times := FFIArray |
| 267 | + fromHandle: timesHolder value |
| 268 | + type: TimeT |
| 269 | + size: samples. |
| 270 | +time1 := times at: 1. |
| 271 | +time2 := times at: 2. |
| 272 | +time3 := times at: 3. |
| 273 | +"NOTICE THAT IN THIS CASE IT IS YOUR RESPONSIBILITY TO RELEASE THE ALLOCATED MEMORY" |
| 274 | +``` |
| 275 | + |
| 276 | +#### Other cases |
| 277 | + |
| 278 | +Even though we focused on arrays, the same pattern applies to all cases involving pointer depth greater than one: you pass a `FFIOop` value holder and interpret the result accordingly. |
0 commit comments