Skip to content

Commit 791f645

Browse files
committed
DESIGN: Add a sequel to the design doc on function composition
This document sketches out some alternatives for the machinery provided to enable function composition. The goal is to provide an exhaustive list of alternatives.
1 parent a76a617 commit 791f645

File tree

1 file changed

+394
-0
lines changed

1 file changed

+394
-0
lines changed
Lines changed: 394 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,394 @@
1+
# Function Composition - Part 2
2+
3+
Status: **Proposed**
4+
5+
<details>
6+
<summary>Metadata</summary>
7+
<dl>
8+
<dt>Contributors</dt>
9+
<dd>@catamorphism</dd>
10+
<dt>First proposed</dt>
11+
<dd>2024-06-xx</dd>
12+
<dt>Pull Requests</dt>
13+
<dd>#000</dd>
14+
</dl>
15+
</details>
16+
17+
## Objective
18+
19+
_What is this proposal trying to achieve?_
20+
21+
[Part 1](https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md) of this document
22+
explained ambiguities in the existing spec
23+
when it comes to function composition.
24+
25+
The goal of this document is to present a _complete_ list of
26+
alternatives that may be considered by the working group.
27+
28+
Each alternative corresponds to a different concrete
29+
definition of "resolved value".
30+
31+
This document is meant to logically precede
32+
[the "Data Flow for Composable Functions" design document](https://github.com/catamorphism/message-format-wg/blob/79ceb57fa305204f26c6635fd586d0e3057cf460/exploration/dataflow-composability.md).
33+
Once an alternative from this document is chosen,
34+
then that document will be revised.
35+
36+
## Background
37+
38+
See https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md for more details.
39+
40+
Depending on the chosen semantics for composition,
41+
functions can either "pipeline the input" (preservation model) or
42+
"operate on the output" (formatted value model),
43+
or both.
44+
45+
Also, depending on the chosen functions, resolved options
46+
might or might not be part of the value returned
47+
by a function implementation.
48+
49+
This suggests several alternatives:
50+
1. Pipeline input, but don't pass along options
51+
2. Pipeline input and pass along options
52+
3. Don't pipeline input (one function operates on the output of another) but do pass along options (is this useful?)
53+
4. Don't pipeline input and don't pass along options
54+
55+
Options 1 and 3 do not seem useful.
56+
This document presents options 2 and 4, and a few variations on them.
57+
58+
Not addressed here: the behavior of compositions of built-in functions
59+
(but the choice here will determine what behaviors are possible).
60+
61+
Not addressed here: the behavior of compositions of custom functions
62+
(which is up to the custom function implementor).
63+
64+
## Requirements
65+
66+
A message that has a valid result in one implementation
67+
should not result in an error in a different implementation.
68+
69+
## Constraints
70+
71+
One prior decision is that the same definition of
72+
"resolved value" appears in multiple places in the spec.
73+
If "resolved value" is defined broadly enough
74+
(an annotated value with rich metadata),
75+
then this prior decision need not be changed.
76+
77+
A second constraint is
78+
the difficulty of developing a precise definition of "resolved value"
79+
that can be made specific in the interface for custom functions,
80+
which is implementation-language-neutral.
81+
82+
A third constraint is the "typeless" nature of the existing MessageFormat spec.
83+
The idea of specifying which functions are able to compose with each other
84+
resembles the idea of specifying a type system for functions.
85+
Specifying rules for function composition, while also remaining typeless,
86+
seems difficult and potentially unpredictable.
87+
88+
## Introducing type names
89+
90+
It's useful to be able to refer to two types:
91+
92+
* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728).
93+
* `ValueType`: This type encompasses strings, numbers, date/time values,
94+
all other possible implementation-specific types that input variables can be
95+
assigned to,
96+
and all possible implementation-specific types that custom and built-in
97+
functions can construct.
98+
Conceptually it's the union of an "input type" and a "formatted value".
99+
100+
It's tagged with a string tag so functions can do type checks.
101+
102+
```
103+
interface ValueType {
104+
type(): string
105+
value(): unknown
106+
}
107+
```
108+
109+
## Alternatives to consider
110+
111+
In lieu of the usual "Proposed design" and "Alternatives considered" sections,
112+
we offer some alternatives already considered in separate discussions.
113+
114+
Because of our constraints, implementations are **not required**
115+
to use the `MessageValue` interface internally as described in
116+
any of the sections.
117+
The purpose of defining the interface is to guide implementors.
118+
An implementation that uses different types internally
119+
but allows the same observable behavior for composition
120+
is compliant with the spec.
121+
122+
Five alternatives are presented:
123+
1. Typed functions
124+
2. Formatted value model
125+
3. Preservation model
126+
4. Allow both kinds of composition
127+
5. Don't allow composition
128+
129+
Alternatives 2 and 3 should be familiar to readers of part 1.
130+
Alternative 4 is an idea from a prior mailing list discussion
131+
of this problem. Alternative 1 is similar to Alternative 3
132+
but introduces additional notation to make composition
133+
easier to think about (which is why it's presented first).
134+
Alternative 5 is included for completeness.
135+
136+
### Typed functions
137+
138+
The following option aims to provide a general mechanism
139+
for custom function authors
140+
to specify how functions compose with each other.
141+
142+
This is an extension of the "preservation model"
143+
from part 1 of this document.
144+
145+
Here, `ValueType` is the most general type
146+
in a system of user-defined types.
147+
Using the function registry,
148+
each custom function could declare its own argument type
149+
and result type.
150+
151+
This does not imply the existence of any static typechecking.
152+
A function passed the wrong type could signal a runtime error.
153+
This does require some mechanism for dynamically inspecting
154+
the type of a value.
155+
156+
Consider Example B1 from part 1 of the document:
157+
158+
Example B1:
159+
```
160+
.local $age = {$person :getAge}
161+
.local $y = {$age :duration skeleton=yM}
162+
.local $z = {$y :uppercase}
163+
```
164+
165+
Informally, we can write the type signatures for
166+
the three custom functions in this example:
167+
168+
```
169+
getAge : Person -> Number
170+
duration : Number -> String
171+
uppercase : String -> String
172+
```
173+
174+
`Number` and `String` are assumed to be subtypes
175+
of `MessageValue`. Thus,
176+
177+
The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md)
178+
attempts to do some of this, but does not define
179+
the structure of the values produced by functions.
180+
181+
An optional static typechecking pass (linting)
182+
would then detect any cases where functions are composed in a way that
183+
doesn't make sense. For example:
184+
185+
Semantically invalid example:
186+
```
187+
.local $z = {$person: uppercase}
188+
```
189+
190+
A person can't be converted to uppercase; or, `:uppercase` expects
191+
a `String`, not a `Person`. So an optional tool could flag this
192+
as an error, assuming that enough type information
193+
was included in the registry.
194+
195+
The resolved value type is similar to what was proposed in
196+
[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/).
197+
198+
```ts
199+
interface MessageValue {
200+
formatToString(): string
201+
formatToX(): X // where X is an implementation-defined type
202+
getValue(): ValueType
203+
properties(): { [key: string]: MessageValue }
204+
selectKeys(keys: string[]): string[]
205+
}
206+
```
207+
208+
The `resolvedOptions()` method is renamed to `properties`.
209+
This is to suggest that individual function implementations
210+
may not pass all of the options through into the resulting
211+
`MessageValue`.
212+
213+
Instead of using `unknown` as the result type of `getValue()`,
214+
we use `ValueType`, mentioned previously.
215+
Instead of using `unknown` as the value type for the
216+
`properties()` object, we use `MessageValue`,
217+
since options can also be full `MessageValue`s with their own options.
218+
219+
Because `ValueType` has a type tag,
220+
custom function implementations can easily
221+
signal dynamic errors if passed an operand of the wrong type.
222+
223+
The advantage of this approach is documentation:
224+
with type names that can be used in type signatures
225+
specified in the registry,
226+
it's easy for users to reason about functions and
227+
understand which combinations of functions
228+
compose with each other.
229+
230+
### Formatted value model (Composition operates on output)
231+
232+
This is an elaboration on the "formatted model" from part 1.
233+
234+
A less general solution is to have a single "resolved value"
235+
type, and specify that if function `g` consumes the resolved value
236+
produced by function `f`,
237+
then `g` operates on the output of `f`.
238+
239+
```
240+
.local $x = {$num :number maxFrac=2}
241+
.local $y = {$x :number maxFrac=5 padStart=3}
242+
```
243+
244+
In this example, `$x` would be bound to the formatted result
245+
of calling `:number` on `$num`. So the `maxFrac` option would
246+
be "lost" and when determining the value of `$y`, the second
247+
set of options would be used.
248+
249+
For built-ins, it suffices to define `ValueType`as something like:
250+
251+
```
252+
FormattedNumber | FormattedDateTime | String
253+
```
254+
255+
because no information about the input needs to be
256+
incorporated into the resolved value.
257+
258+
However, to make it possible for custom functions to return
259+
a wider set of types, a wider `ValueType` definition would be needed.
260+
261+
The `MessageValue` definition would look as in #728, but without
262+
the `resolvedOptions()` method:
263+
264+
```ts
265+
interface MessageValue {
266+
formatToString(): string
267+
formatToX(): X // where X is an implementation-defined type
268+
getValue(): ValueType
269+
selectKeys(keys: string[]): string[]
270+
}
271+
```
272+
273+
`MessageValue` is effectively a `ValueType` with methods.
274+
275+
Using this definition would make some of the use cases from part 1
276+
impractical.
277+
278+
### Preservation model (composition can operate on input and options)
279+
280+
This is an extension of
281+
the "preservation model" from part 1,
282+
if resolved options are included in the output.
283+
This model can also be thought of as functions "pipelining"
284+
the input through multiple calls.
285+
286+
A JSON representation of an example resolved value might be:
287+
```
288+
{
289+
input: { type: "number", value: 1 },
290+
output: { type: "FormattedNumber", value: FN }
291+
properties: { "maximumFractionDigits": 2 }
292+
}
293+
```
294+
295+
(The number "2" is shown for brevity, but it would
296+
actually be a `MessageValue` itself.)
297+
298+
where `FN` is an instance of an implementation-specific
299+
`FormattedNumber` type, representing the number 1.
300+
301+
The resolved value interface would include both "input"
302+
and "output" methods:
303+
304+
```ts
305+
interface MessageValue {
306+
formatToString(): string
307+
formatToX(): X // where X is an implementation-defined type
308+
getInput(): ValueType
309+
getOutput(): ValueType
310+
properties(): { [key: string]: MessageValue }
311+
selectKeys(keys: string[]): string[]
312+
}
313+
```
314+
315+
Without a mechanism for type signatures,
316+
it may be hard for users to tell which combinations
317+
of functions compose without errors,
318+
and for implementors to document that information
319+
for users.
320+
321+
### Allow both kinds of composition (with different syntax)
322+
323+
By introducing new syntax, the same function could have
324+
either "preservation" or "formatted value" behavior.
325+
326+
Consider (this suggestion is from Elango Cheran):
327+
328+
```
329+
.local $x = {$num :number maxFrac=2}
330+
.pipeline $y = {$x :number maxFrac=5 padStart=3}
331+
{{$x} {$y}}
332+
```
333+
334+
If `$num` is `0.33333`,
335+
then the result of formatting would be
336+
337+
```
338+
0.33 000.33333
339+
```
340+
341+
An extra argument to function implementations,
342+
`pipeline`, would be added.
343+
344+
`.pipeline` would be a new keyword that acts like `.local`,
345+
except that if its expression has a function annotation,
346+
the formatter would pass in `true` for the `pipeline`
347+
argument to the function implementation.
348+
349+
The `resolvedOptions()` method should be ignored if `pipeline`
350+
is `false`.
351+
352+
```ts
353+
interface MessageValue {
354+
formatToString(): string
355+
formatToX(): X // where X is an implementation-defined type
356+
getInput(): MessageValue
357+
getOutput(): unknown
358+
properties(): { [key: string]: MessageValue }
359+
selectKeys(keys: string[]): string[]
360+
}
361+
```
362+
363+
### Don't allow composition for built-in functions
364+
365+
Another option is to define the built-in functions this way,
366+
notionally:
367+
368+
```
369+
number : Number -> FormattedNumber
370+
date : Date -> FormattedDate
371+
```
372+
373+
Then it would be a runtime error to pass a `FormattedNumber` into `number`
374+
or to pass a `FormattedDate` into `date`.
375+
376+
The resolved value type would look like:
377+
378+
```ts
379+
interface MessageValue {
380+
formatToString(): string
381+
formatToX(): X // where X is an implementation-defined type
382+
getValue(): ValueType
383+
selectKeys(keys: string[]): string[]
384+
}
385+
```
386+
387+
As with the formatted value model, this restricts the
388+
behavior of custom functions.
389+
390+
### Non-alternative: Allow composition in some implementations
391+
392+
Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728).
393+
394+
This violates the portability requirement.

0 commit comments

Comments
 (0)