|
| 1 | +# Function Composition - Part 2 |
| 2 | + |
| 3 | +Status: **Proposed** |
| 4 | + |
| 5 | +<details> |
| 6 | + <summary>Metadata</summary> |
| 7 | + <dl> |
| 8 | + <dt>Contributors</dt> |
| 9 | + <dd>@catamorphism</dd> |
| 10 | + <dt>First proposed</dt> |
| 11 | + <dd>2024-06-xx</dd> |
| 12 | + <dt>Pull Requests</dt> |
| 13 | + <dd>#000</dd> |
| 14 | + </dl> |
| 15 | +</details> |
| 16 | + |
| 17 | +## Objective |
| 18 | + |
| 19 | +_What is this proposal trying to achieve?_ |
| 20 | + |
| 21 | +[Part 1](https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md) of this document |
| 22 | +explained ambiguities in the existing spec |
| 23 | +when it comes to function composition. |
| 24 | + |
| 25 | +The goal of this document is to present a _complete_ list of |
| 26 | +alternatives that may be considered by the working group. |
| 27 | + |
| 28 | +Each alternative corresponds to a different concrete |
| 29 | +definition of "resolved value". |
| 30 | + |
| 31 | +This document is meant to logically precede |
| 32 | +[the "Data Flow for Composable Functions" design document](https://github.com/catamorphism/message-format-wg/blob/79ceb57fa305204f26c6635fd586d0e3057cf460/exploration/dataflow-composability.md). |
| 33 | +Once an alternative from this document is chosen, |
| 34 | +then that document will be revised. |
| 35 | + |
| 36 | +## Background |
| 37 | + |
| 38 | +See https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md for more details. |
| 39 | + |
| 40 | +Depending on the chosen semantics for composition, |
| 41 | +functions can either "pipeline the input" (preservation model) or |
| 42 | +"operate on the output" (formatted value model), |
| 43 | +or both. |
| 44 | + |
| 45 | +Also, depending on the chosen functions, resolved options |
| 46 | +might or might not be part of the value returned |
| 47 | +by a function implementation. |
| 48 | + |
| 49 | +This suggests several alternatives: |
| 50 | +1. Pipeline input, but don't pass along options |
| 51 | +2. Pipeline input and pass along options |
| 52 | +3. Don't pipeline input (one function operates on the output of another) but do pass along options (is this useful?) |
| 53 | +4. Don't pipeline input and don't pass along options |
| 54 | + |
| 55 | +Options 1 and 3 do not seem useful. |
| 56 | +This document presents options 2 and 4, and a few variations on them. |
| 57 | + |
| 58 | +Not addressed here: the behavior of compositions of built-in functions |
| 59 | +(but the choice here will determine what behaviors are possible). |
| 60 | + |
| 61 | +Not addressed here: the behavior of compositions of custom functions |
| 62 | +(which is up to the custom function implementor). |
| 63 | + |
| 64 | +## Requirements |
| 65 | + |
| 66 | +A message that has a valid result in one implementation |
| 67 | +should not result in an error in a different implementation. |
| 68 | + |
| 69 | +## Constraints |
| 70 | + |
| 71 | +One prior decision is that the same definition of |
| 72 | +"resolved value" appears in multiple places in the spec. |
| 73 | +If "resolved value" is defined broadly enough |
| 74 | +(an annotated value with rich metadata), |
| 75 | +then this prior decision need not be changed. |
| 76 | + |
| 77 | +A second constraint is |
| 78 | +the difficulty of developing a precise definition of "resolved value" |
| 79 | +that can be made specific in the interface for custom functions, |
| 80 | +which is implementation-language-neutral. |
| 81 | + |
| 82 | +A third constraint is the "typeless" nature of the existing MessageFormat spec. |
| 83 | +The idea of specifying which functions are able to compose with each other |
| 84 | +resembles the idea of specifying a type system for functions. |
| 85 | +Specifying rules for function composition, while also remaining typeless, |
| 86 | +seems difficult and potentially unpredictable. |
| 87 | + |
| 88 | +## Introducing type names |
| 89 | + |
| 90 | +It's useful to be able to refer to two types: |
| 91 | + |
| 92 | +* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). |
| 93 | +* `ValueType`: This type encompasses strings, numbers, date/time values, |
| 94 | +all other possible implementation-specific types that input variables can be |
| 95 | +assigned to, |
| 96 | +and all possible implementation-specific types that custom and built-in |
| 97 | +functions can construct. |
| 98 | +Conceptually it's the union of an "input type" and a "formatted value". |
| 99 | + |
| 100 | +It's tagged with a string tag so functions can do type checks. |
| 101 | + |
| 102 | +``` |
| 103 | +interface ValueType { |
| 104 | + type(): string |
| 105 | + value(): unknown |
| 106 | +} |
| 107 | +``` |
| 108 | + |
| 109 | +## Alternatives to consider |
| 110 | + |
| 111 | +In lieu of the usual "Proposed design" and "Alternatives considered" sections, |
| 112 | +we offer some alternatives already considered in separate discussions. |
| 113 | + |
| 114 | +Because of our constraints, implementations are **not required** |
| 115 | +to use the `MessageValue` interface internally as described in |
| 116 | +any of the sections. |
| 117 | +The purpose of defining the interface is to guide implementors. |
| 118 | +An implementation that uses different types internally |
| 119 | +but allows the same observable behavior for composition |
| 120 | +is compliant with the spec. |
| 121 | + |
| 122 | +Five alternatives are presented: |
| 123 | +1. Typed functions |
| 124 | +2. Formatted value model |
| 125 | +3. Preservation model |
| 126 | +4. Allow both kinds of composition |
| 127 | +5. Don't allow composition |
| 128 | + |
| 129 | +Alternatives 2 and 3 should be familiar to readers of part 1. |
| 130 | +Alternative 4 is an idea from a prior mailing list discussion |
| 131 | +of this problem. Alternative 1 is similar to Alternative 3 |
| 132 | +but introduces additional notation to make composition |
| 133 | +easier to think about (which is why it's presented first). |
| 134 | +Alternative 5 is included for completeness. |
| 135 | + |
| 136 | +### Typed functions |
| 137 | + |
| 138 | +The following option aims to provide a general mechanism |
| 139 | +for custom function authors |
| 140 | +to specify how functions compose with each other. |
| 141 | + |
| 142 | +This is an extension of the "preservation model" |
| 143 | +from part 1 of this document. |
| 144 | + |
| 145 | +Here, `ValueType` is the most general type |
| 146 | +in a system of user-defined types. |
| 147 | +Using the function registry, |
| 148 | +each custom function could declare its own argument type |
| 149 | +and result type. |
| 150 | + |
| 151 | +This does not imply the existence of any static typechecking. |
| 152 | +A function passed the wrong type could signal a runtime error. |
| 153 | +This does require some mechanism for dynamically inspecting |
| 154 | +the type of a value. |
| 155 | + |
| 156 | +Consider Example B1 from part 1 of the document: |
| 157 | + |
| 158 | +Example B1: |
| 159 | +``` |
| 160 | + .local $age = {$person :getAge} |
| 161 | + .local $y = {$age :duration skeleton=yM} |
| 162 | + .local $z = {$y :uppercase} |
| 163 | +``` |
| 164 | + |
| 165 | +Informally, we can write the type signatures for |
| 166 | +the three custom functions in this example: |
| 167 | + |
| 168 | +``` |
| 169 | +getAge : Person -> Number |
| 170 | +duration : Number -> String |
| 171 | +uppercase : String -> String |
| 172 | +``` |
| 173 | + |
| 174 | +`Number` and `String` are assumed to be subtypes |
| 175 | +of `MessageValue`. Thus, |
| 176 | + |
| 177 | +The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) |
| 178 | +attempts to do some of this, but does not define |
| 179 | +the structure of the values produced by functions. |
| 180 | + |
| 181 | +An optional static typechecking pass (linting) |
| 182 | +would then detect any cases where functions are composed in a way that |
| 183 | +doesn't make sense. For example: |
| 184 | + |
| 185 | +Semantically invalid example: |
| 186 | +``` |
| 187 | +.local $z = {$person: uppercase} |
| 188 | +``` |
| 189 | + |
| 190 | +A person can't be converted to uppercase; or, `:uppercase` expects |
| 191 | +a `String`, not a `Person`. So an optional tool could flag this |
| 192 | +as an error, assuming that enough type information |
| 193 | +was included in the registry. |
| 194 | + |
| 195 | +The resolved value type is similar to what was proposed in |
| 196 | +[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/). |
| 197 | + |
| 198 | +```ts |
| 199 | +interface MessageValue { |
| 200 | + formatToString(): string |
| 201 | + formatToX(): X // where X is an implementation-defined type |
| 202 | + getValue(): ValueType |
| 203 | + properties(): { [key: string]: MessageValue } |
| 204 | + selectKeys(keys: string[]): string[] |
| 205 | +} |
| 206 | +``` |
| 207 | + |
| 208 | +The `resolvedOptions()` method is renamed to `properties`. |
| 209 | +This is to suggest that individual function implementations |
| 210 | +may not pass all of the options through into the resulting |
| 211 | +`MessageValue`. |
| 212 | + |
| 213 | +Instead of using `unknown` as the result type of `getValue()`, |
| 214 | +we use `ValueType`, mentioned previously. |
| 215 | +Instead of using `unknown` as the value type for the |
| 216 | +`properties()` object, we use `MessageValue`, |
| 217 | +since options can also be full `MessageValue`s with their own options. |
| 218 | + |
| 219 | +Because `ValueType` has a type tag, |
| 220 | +custom function implementations can easily |
| 221 | +signal dynamic errors if passed an operand of the wrong type. |
| 222 | + |
| 223 | +The advantage of this approach is documentation: |
| 224 | +with type names that can be used in type signatures |
| 225 | +specified in the registry, |
| 226 | +it's easy for users to reason about functions and |
| 227 | +understand which combinations of functions |
| 228 | +compose with each other. |
| 229 | + |
| 230 | +### Formatted value model (Composition operates on output) |
| 231 | + |
| 232 | +This is an elaboration on the "formatted model" from part 1. |
| 233 | + |
| 234 | +A less general solution is to have a single "resolved value" |
| 235 | +type, and specify that if function `g` consumes the resolved value |
| 236 | +produced by function `f`, |
| 237 | +then `g` operates on the output of `f`. |
| 238 | + |
| 239 | +``` |
| 240 | + .local $x = {$num :number maxFrac=2} |
| 241 | + .local $y = {$x :number maxFrac=5 padStart=3} |
| 242 | +``` |
| 243 | + |
| 244 | +In this example, `$x` would be bound to the formatted result |
| 245 | +of calling `:number` on `$num`. So the `maxFrac` option would |
| 246 | +be "lost" and when determining the value of `$y`, the second |
| 247 | +set of options would be used. |
| 248 | + |
| 249 | +For built-ins, it suffices to define `ValueType`as something like: |
| 250 | + |
| 251 | +``` |
| 252 | +FormattedNumber | FormattedDateTime | String |
| 253 | +``` |
| 254 | + |
| 255 | +because no information about the input needs to be |
| 256 | +incorporated into the resolved value. |
| 257 | + |
| 258 | +However, to make it possible for custom functions to return |
| 259 | +a wider set of types, a wider `ValueType` definition would be needed. |
| 260 | + |
| 261 | +The `MessageValue` definition would look as in #728, but without |
| 262 | +the `resolvedOptions()` method: |
| 263 | + |
| 264 | +```ts |
| 265 | +interface MessageValue { |
| 266 | + formatToString(): string |
| 267 | + formatToX(): X // where X is an implementation-defined type |
| 268 | + getValue(): ValueType |
| 269 | + selectKeys(keys: string[]): string[] |
| 270 | +} |
| 271 | +``` |
| 272 | + |
| 273 | +`MessageValue` is effectively a `ValueType` with methods. |
| 274 | + |
| 275 | +Using this definition would make some of the use cases from part 1 |
| 276 | +impractical. |
| 277 | + |
| 278 | +### Preservation model (composition can operate on input and options) |
| 279 | + |
| 280 | +This is an extension of |
| 281 | +the "preservation model" from part 1, |
| 282 | +if resolved options are included in the output. |
| 283 | +This model can also be thought of as functions "pipelining" |
| 284 | +the input through multiple calls. |
| 285 | + |
| 286 | +A JSON representation of an example resolved value might be: |
| 287 | +``` |
| 288 | +{ |
| 289 | + input: { type: "number", value: 1 }, |
| 290 | + output: { type: "FormattedNumber", value: FN } |
| 291 | + properties: { "maximumFractionDigits": 2 } |
| 292 | +} |
| 293 | +``` |
| 294 | + |
| 295 | +(The number "2" is shown for brevity, but it would |
| 296 | +actually be a `MessageValue` itself.) |
| 297 | + |
| 298 | +where `FN` is an instance of an implementation-specific |
| 299 | +`FormattedNumber` type, representing the number 1. |
| 300 | + |
| 301 | +The resolved value interface would include both "input" |
| 302 | +and "output" methods: |
| 303 | + |
| 304 | +```ts |
| 305 | +interface MessageValue { |
| 306 | + formatToString(): string |
| 307 | + formatToX(): X // where X is an implementation-defined type |
| 308 | + getInput(): ValueType |
| 309 | + getOutput(): ValueType |
| 310 | + properties(): { [key: string]: MessageValue } |
| 311 | + selectKeys(keys: string[]): string[] |
| 312 | +} |
| 313 | +``` |
| 314 | + |
| 315 | +Without a mechanism for type signatures, |
| 316 | +it may be hard for users to tell which combinations |
| 317 | +of functions compose without errors, |
| 318 | +and for implementors to document that information |
| 319 | +for users. |
| 320 | + |
| 321 | +### Allow both kinds of composition (with different syntax) |
| 322 | + |
| 323 | +By introducing new syntax, the same function could have |
| 324 | +either "preservation" or "formatted value" behavior. |
| 325 | + |
| 326 | +Consider (this suggestion is from Elango Cheran): |
| 327 | + |
| 328 | +``` |
| 329 | + .local $x = {$num :number maxFrac=2} |
| 330 | + .pipeline $y = {$x :number maxFrac=5 padStart=3} |
| 331 | + {{$x} {$y}} |
| 332 | +``` |
| 333 | + |
| 334 | +If `$num` is `0.33333`, |
| 335 | +then the result of formatting would be |
| 336 | + |
| 337 | +``` |
| 338 | +0.33 000.33333 |
| 339 | +``` |
| 340 | + |
| 341 | +An extra argument to function implementations, |
| 342 | +`pipeline`, would be added. |
| 343 | + |
| 344 | +`.pipeline` would be a new keyword that acts like `.local`, |
| 345 | +except that if its expression has a function annotation, |
| 346 | +the formatter would pass in `true` for the `pipeline` |
| 347 | +argument to the function implementation. |
| 348 | + |
| 349 | +The `resolvedOptions()` method should be ignored if `pipeline` |
| 350 | +is `false`. |
| 351 | + |
| 352 | +```ts |
| 353 | +interface MessageValue { |
| 354 | + formatToString(): string |
| 355 | + formatToX(): X // where X is an implementation-defined type |
| 356 | + getInput(): MessageValue |
| 357 | + getOutput(): unknown |
| 358 | + properties(): { [key: string]: MessageValue } |
| 359 | + selectKeys(keys: string[]): string[] |
| 360 | +} |
| 361 | +``` |
| 362 | + |
| 363 | +### Don't allow composition for built-in functions |
| 364 | + |
| 365 | +Another option is to define the built-in functions this way, |
| 366 | +notionally: |
| 367 | + |
| 368 | +``` |
| 369 | +number : Number -> FormattedNumber |
| 370 | +date : Date -> FormattedDate |
| 371 | +``` |
| 372 | + |
| 373 | +Then it would be a runtime error to pass a `FormattedNumber` into `number` |
| 374 | +or to pass a `FormattedDate` into `date`. |
| 375 | + |
| 376 | +The resolved value type would look like: |
| 377 | + |
| 378 | +```ts |
| 379 | +interface MessageValue { |
| 380 | + formatToString(): string |
| 381 | + formatToX(): X // where X is an implementation-defined type |
| 382 | + getValue(): ValueType |
| 383 | + selectKeys(keys: string[]): string[] |
| 384 | +} |
| 385 | +``` |
| 386 | + |
| 387 | +As with the formatted value model, this restricts the |
| 388 | +behavior of custom functions. |
| 389 | + |
| 390 | +### Non-alternative: Allow composition in some implementations |
| 391 | + |
| 392 | +Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). |
| 393 | + |
| 394 | +This violates the portability requirement. |
0 commit comments