Skip to content

Conversation

@eemeli
Copy link
Collaborator

@eemeli eemeli commented Feb 2, 2024

Most messages are simple, and have no declarations. They should be optional in the data model, so that we don't need to include an empty declarations: [] for every one of them.

@eemeli eemeli added the data model Issues related to the Interchange Data Model label Feb 2, 2024
@eemeli eemeli requested review from aphillips and stasm February 2, 2024 06:58
Copy link
Collaborator

@stasm stasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right direction. I suggest a wontfix.

The data model is not normative:

Implementations are not required to use this data model for their internal representation of messages.

Implementations are free to make declarations optional for the purpose of optimizing memory allocation or otherwise. If, however, they want to translate their internal data model to the canonical one recommended (but not enforced) by the spec, then I think they should include an empty declarations array. It's semantically more correct to do so, and it makes the canonical data model easier to work with.

Note that in many languages, an empty vector doesn't allocate memory, so the overhead is minimal (e.g. C++, Rust).

@stasm stasm requested a review from catamorphism February 2, 2024 10:43
@eemeli
Copy link
Collaborator Author

eemeli commented Feb 2, 2024

Consider, however, our stated primary purpose for the data model:

To work with messages defined in other syntaxes than that of MessageFormat 2, an equivalent data model representation is also defined.

Are there any other message formats that provide for something like declarations? I'm not aware of any, but I'd be happy to be corrected here.

My point here is that as the data model is defining an interface for messages in other syntaxes to be expressed as MF2, or for MF2 content to be expressed in such syntaxes, it should be reasonable for the interface to take into account both sides here, and not require the empty declarations array.

@catamorphism
Copy link
Collaborator

catamorphism commented Feb 7, 2024

I won't block this, but I agree with @stasm -- having two different representations for a message with no declarations seems weird. It also complicates the implementation with extra special cases: consider (pseudocode)

for (decl in message.declarations) {
  doSomethingWith(decl);
}

vs.

if (message.declarations) {
  for (decl in message.declarations) {
     doSomethingWith(decl);
 }
}

To avoid the special case, the implementation could canonicalize the data model to an internal form that requires the declarations field. But that means extra work for implementations that want the more orthogonal representation, whereas making the declarations field required means extra work for what I would expect to be the less common case.

@aphillips
Copy link
Member

Both objections seem reasonable to me? @eemeli would you be okay to keep as is? We can seek feedback on the data model in general in tech preview.

@eemeli
Copy link
Collaborator Author

eemeli commented Feb 9, 2024

The issues raised by @stasm and @catamorphism seem to be elevating the concerns of the implementation needing to include a single if statement above the utility gained by not needing an empty array for the primary use case of the data model, i.e. interaction with other syntaxes, none of which support declarations. Doing so would seem rather odd to me.

Allowing the declarations to be optional would also match the rest of the data model.

Expression attributes are optional:

interface LiteralExpression {
type: "expression";
arg: Literal;
annotation?: FunctionAnnotation | UnsupportedAnnotation;
attributes?: Attribute[];
}

FunctionAnnotation options are optional:

interface FunctionAnnotation {
type: "function";
name: string;
options?: Option[];
}

Markup options and attributes are optional:

interface MarkupOpen {
type: "markup";
kind: "open";
name: string;
options?: Option[];
attributes?: Attribute[];
}

@stasm
Copy link
Collaborator

stasm commented Feb 9, 2024

The issues raised by @stasm and @catamorphism seem to be elevating the concerns of the implementation needing to include a single if statement above the utility gained by not needing an empty array for the primary use case of the data model (...)

It's not about just a single if, but rather about:

  • (a) the semantic correctness ("this message has 0 declarations"), and
  • (b) the "monadicness" of the data model.

What I mean by (b) is that thanks to the guarantee of there always being an array, working with the data model becomes easier, more expressive, and less error prone. For example, it's trivial to chain operations like map and filter, it's easier to write recursive visitors, and it's more convenient to handle errors in such chains.

The canonical data model should be, well, "canonical". I don't think we should introduce optimizations to it.

Expression attributes are optional:

Expressions attributes are not part of the proposed spec; we only reserved the syntax for them. Why are they in the data model?

FunctionAnnotation options are optional:
Markup options and attributes are optional:

We should fix both of these; they should be empty arrays, for the same reasons as declarations.

@aphillips
Copy link
Member

@eemeli So I take it that you don't think we can park this for ldml45?

@macchiati
Copy link
Member

A general point is that the data model should be a reference model. Few if any implementations will implement it precisely as described. And we cannot make them do that, especially in environments where different structures are more efficient.

And a reference model should favor reader comprehension over "memory savings".

That being said, the data model doesn't have to be solid for the tech preview, so changes can come after; it can be parked.

@stasm
Copy link
Collaborator

stasm commented Feb 10, 2024

Expression attributes are optional:

Expressions attributes are not part of the proposed spec; we only reserved the syntax for them. Why are they in the data model?

I filed #632 to remove expression attributes from the data model (and keep them reserved in the syntax).

FunctionAnnotation options are optional:
Markup options and attributes are optional:

We should fix both of these; they should be empty arrays, for the same reasons as declarations.

I filed #633 to make all arrays in the data model non-optional. This is the opposite of what this PR proposes.

@eemeli
Copy link
Collaborator Author

eemeli commented Feb 12, 2024

A general point is that the data model should be a reference model. Few if any implementations will implement it precisely as described.

This is not necessarily the case. The JS Intl.MessageFormat proposal, for instance, currently includes this:

The implementation-defined abstract operation GetMessageData takes argument source (a String or an Object) and returns an Object conforming to the JSON Schema definition of a message according to the Unicode MessageFormat 2.0 specification.

If source is a String, it returns the message data representation corresponding to the input source according to the Unicode MessageFormat 2.0 syntax. If source is an Object, it checks that source holds a valid message data representation, and returns an equivalent Object that is not affected by any further changes to source.

As it's looking likely that syntax parsing will need to be initially left out of the proposal, this is placing the data model in a rather prominent position. Much like the syntax, the JS data model API will need to be convenient to use, so that e.g. a message like

{ type: 'message', pattern: [
    'Hello ',
    { type: 'expression',
      arg: { type: 'variable', name: 'place' },
      annotation: { type: 'function', name: 'string' } }] }

won't be thrown out because it does not include an empty declarations or an empty function options array.

The intent with the current PR is to ensure that the MF2 data model definition meets that bar, and so avoid a need or an interest for the ECMA-402 spec to define its own less-strict data model specification, as that would be unfortunate.

This change also makes future changes to the data model easier to implement. Consider, for instance, if we accept #632 as currently proposed, and remove attributes from the data model. It is possible that their definition (or the definition of any similar feature that will require representation in the data model) will happen in MF 2.1 or some later spec version. What should we do then about the requirement for any such added fields; will they be optional, or required?

One purpose of the data model is intended for interchange between systems, in APIs. In such use, I do not think it's reasonable to expect and require both endpoints to be updated simultaneously, as would be necessitated by any new required field. Instead, they should be optional. Does it not make sense then to make all such fields (declarations, attributes, and options) optional, so that we won't have a mix of them later on?

@stasm
Copy link
Collaborator

stasm commented Feb 12, 2024

This is not necessarily the case. The JS Intl.MessageFormat proposal, for instance, currently includes this: (...)

This deserves a disclaimer: the quoted spec is a proposal, of which you are the author.

It's possible that the proposed spec for Intl.MessageFormat makes wrong assumptions about the data model. It may be a good idea to consider whether it even should bind itself so closely to MF2's reference data model and its JSON schema definition, given that we've made it explicit that they are not normative.

Early on we made a decision to make the syntax, not the data model, the canonical representation for messages, based on the assumption that the syntax is a good common denominator: it's relatively simple, it's text, and we can guarantee its stability over time.

The recent development in TC39 deserves a dedicated discussion since it has impact on the strategy we chose for MF2. In the meantime, #633 aligns the data model closer to what I'd consider the "reference" data model.

@aphillips
Copy link
Member

obsoleted by #633

@aphillips aphillips closed this Feb 15, 2024
@aphillips aphillips deleted the optional-declarations branch February 15, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data model Issues related to the Interchange Data Model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants