Make declarations array optional in data model #620

eemeli · 2024-02-02T06:58:25Z

Most messages are simple, and have no declarations. They should be optional in the data model, so that we don't need to include an empty declarations: [] for every one of them.

stasm

I don't think this is the right direction. I suggest a wontfix.

The data model is not normative:

Implementations are not required to use this data model for their internal representation of messages.

Implementations are free to make declarations optional for the purpose of optimizing memory allocation or otherwise. If, however, they want to translate their internal data model to the canonical one recommended (but not enforced) by the spec, then I think they should include an empty declarations array. It's semantically more correct to do so, and it makes the canonical data model easier to work with.

Note that in many languages, an empty vector doesn't allocate memory, so the overhead is minimal (e.g. C++, Rust).

eemeli · 2024-02-02T12:41:25Z

Consider, however, our stated primary purpose for the data model:

To work with messages defined in other syntaxes than that of MessageFormat 2, an equivalent data model representation is also defined.

Are there any other message formats that provide for something like declarations? I'm not aware of any, but I'd be happy to be corrected here.

My point here is that as the data model is defining an interface for messages in other syntaxes to be expressed as MF2, or for MF2 content to be expressed in such syntaxes, it should be reasonable for the interface to take into account both sides here, and not require the empty declarations array.

catamorphism · 2024-02-07T21:36:15Z

I won't block this, but I agree with @stasm -- having two different representations for a message with no declarations seems weird. It also complicates the implementation with extra special cases: consider (pseudocode)

for (decl in message.declarations) {
  doSomethingWith(decl);
}

vs.

if (message.declarations) {
  for (decl in message.declarations) {
     doSomethingWith(decl);
 }
}

To avoid the special case, the implementation could canonicalize the data model to an internal form that requires the declarations field. But that means extra work for implementations that want the more orthogonal representation, whereas making the declarations field required means extra work for what I would expect to be the less common case.

aphillips · 2024-02-07T21:45:12Z

Both objections seem reasonable to me? @eemeli would you be okay to keep as is? We can seek feedback on the data model in general in tech preview.

eemeli · 2024-02-09T10:15:10Z

The issues raised by @stasm and @catamorphism seem to be elevating the concerns of the implementation needing to include a single if statement above the utility gained by not needing an empty array for the primary use case of the data model, i.e. interaction with other syntaxes, none of which support declarations. Doing so would seem rather odd to me.

Allowing the declarations to be optional would also match the rest of the data model.

Expression attributes are optional:

message-format-wg/spec/data-model/README.md

Lines 135 to 140 in 551b5f7

    
           interface LiteralExpression { 
        
             type: "expression"; 
        
             arg: Literal; 
        
             annotation?: FunctionAnnotation | UnsupportedAnnotation; 
        
             attributes?: Attribute[]; 
        
           }

FunctionAnnotation options are optional:

message-format-wg/spec/data-model/README.md

Lines 199 to 203 in 551b5f7

    
           interface FunctionAnnotation { 
        
             type: "function"; 
        
             name: string; 
        
             options?: Option[]; 
        
           }

Markup options and attributes are optional:

message-format-wg/spec/data-model/README.md

Lines 249 to 255 in 551b5f7

    
           interface MarkupOpen { 
        
             type: "markup"; 
        
             kind: "open"; 
        
             name: string; 
        
             options?: Option[]; 
        
             attributes?: Attribute[]; 
        
           }

stasm · 2024-02-09T10:33:13Z

The issues raised by @stasm and @catamorphism seem to be elevating the concerns of the implementation needing to include a single if statement above the utility gained by not needing an empty array for the primary use case of the data model (...)

It's not about just a single if, but rather about:

(a) the semantic correctness ("this message has 0 declarations"), and
(b) the "monadicness" of the data model.

What I mean by (b) is that thanks to the guarantee of there always being an array, working with the data model becomes easier, more expressive, and less error prone. For example, it's trivial to chain operations like map and filter, it's easier to write recursive visitors, and it's more convenient to handle errors in such chains.

The canonical data model should be, well, "canonical". I don't think we should introduce optimizations to it.

Expression attributes are optional:

Expressions attributes are not part of the proposed spec; we only reserved the syntax for them. Why are they in the data model?

FunctionAnnotation options are optional:
Markup options and attributes are optional:

We should fix both of these; they should be empty arrays, for the same reasons as declarations.

aphillips · 2024-02-09T16:44:05Z

@eemeli So I take it that you don't think we can park this for ldml45?

macchiati · 2024-02-09T18:48:10Z

A general point is that the data model should be a reference model. Few if any implementations will implement it precisely as described. And we cannot make them do that, especially in environments where different structures are more efficient.

And a reference model should favor reader comprehension over "memory savings".

That being said, the data model doesn't have to be solid for the tech preview, so changes can come after; it can be parked.

stasm · 2024-02-10T10:28:48Z

Expression attributes are optional:

Expressions attributes are not part of the proposed spec; we only reserved the syntax for them. Why are they in the data model?

I filed #632 to remove expression attributes from the data model (and keep them reserved in the syntax).

FunctionAnnotation options are optional:
Markup options and attributes are optional:

We should fix both of these; they should be empty arrays, for the same reasons as declarations.

I filed #633 to make all arrays in the data model non-optional. This is the opposite of what this PR proposes.

eemeli · 2024-02-12T09:27:00Z

A general point is that the data model should be a reference model. Few if any implementations will implement it precisely as described.

This is not necessarily the case. The JS Intl.MessageFormat proposal, for instance, currently includes this:

The implementation-defined abstract operation GetMessageData takes argument source (a String or an Object) and returns an Object conforming to the JSON Schema definition of a message according to the Unicode MessageFormat 2.0 specification.

If source is a String, it returns the message data representation corresponding to the input source according to the Unicode MessageFormat 2.0 syntax. If source is an Object, it checks that source holds a valid message data representation, and returns an equivalent Object that is not affected by any further changes to source.

As it's looking likely that syntax parsing will need to be initially left out of the proposal, this is placing the data model in a rather prominent position. Much like the syntax, the JS data model API will need to be convenient to use, so that e.g. a message like

{ type: 'message', pattern: [
    'Hello ',
    { type: 'expression',
      arg: { type: 'variable', name: 'place' },
      annotation: { type: 'function', name: 'string' } }] }

won't be thrown out because it does not include an empty declarations or an empty function options array.

The intent with the current PR is to ensure that the MF2 data model definition meets that bar, and so avoid a need or an interest for the ECMA-402 spec to define its own less-strict data model specification, as that would be unfortunate.

This change also makes future changes to the data model easier to implement. Consider, for instance, if we accept #632 as currently proposed, and remove attributes from the data model. It is possible that their definition (or the definition of any similar feature that will require representation in the data model) will happen in MF 2.1 or some later spec version. What should we do then about the requirement for any such added fields; will they be optional, or required?

One purpose of the data model is intended for interchange between systems, in APIs. In such use, I do not think it's reasonable to expect and require both endpoints to be updated simultaneously, as would be necessitated by any new required field. Instead, they should be optional. Does it not make sense then to make all such fields (declarations, attributes, and options) optional, so that we won't have a mix of them later on?

stasm · 2024-02-12T09:52:13Z

This is not necessarily the case. The JS Intl.MessageFormat proposal, for instance, currently includes this: (...)

This deserves a disclaimer: the quoted spec is a proposal, of which you are the author.

It's possible that the proposed spec for Intl.MessageFormat makes wrong assumptions about the data model. It may be a good idea to consider whether it even should bind itself so closely to MF2's reference data model and its JSON schema definition, given that we've made it explicit that they are not normative.

Early on we made a decision to make the syntax, not the data model, the canonical representation for messages, based on the assumption that the syntax is a good common denominator: it's relatively simple, it's text, and we can guarantee its stability over time.

The recent development in TC39 deserves a dedicated discussion since it has impact on the strategy we chose for MF2. In the meantime, #633 aligns the data model closer to what I'd consider the "reference" data model.

aphillips · 2024-02-15T16:53:35Z

obsoleted by #633

Make declarations array optional in data model

16e12dc

eemeli added the data model Issues related to the Interchange Data Model label Feb 2, 2024

eemeli requested review from aphillips and stasm February 2, 2024 06:58

stasm requested changes Feb 2, 2024

View reviewed changes

stasm requested a review from catamorphism February 2, 2024 10:43

This was referenced Feb 9, 2024

Remove expression attributes from the data model #632

Closed

Make arrays in the data model not optional #633

Merged

aphillips closed this Feb 15, 2024

aphillips deleted the optional-declarations branch February 15, 2024 16:53

Uh oh!

Make declarations array optional in data model #620

Make declarations array optional in data model #620

Uh oh!

Conversation

eemeli commented Feb 2, 2024

Uh oh!

stasm left a comment

Choose a reason for hiding this comment

Uh oh!

eemeli commented Feb 2, 2024

Uh oh!

catamorphism commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aphillips commented Feb 7, 2024

Uh oh!

eemeli commented Feb 9, 2024

Uh oh!

stasm commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aphillips commented Feb 9, 2024

Uh oh!

macchiati commented Feb 9, 2024

Uh oh!

stasm commented Feb 10, 2024

Uh oh!

eemeli commented Feb 12, 2024

Uh oh!

stasm commented Feb 12, 2024

Uh oh!

aphillips commented Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

catamorphism commented Feb 7, 2024 •

edited

Loading

stasm commented Feb 9, 2024 •

edited

Loading