From 724d2c1be7e1a3174ab5186f8629fc53042f281f Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Mon, 10 Jun 2024 10:28:50 +0100
Subject: [PATCH 1/6] Update tests design document

---
 exploration/data-driven-tests.md | 252 ++++++++-----------------------
 1 file changed, 64 insertions(+), 188 deletions(-)
diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index cf1c710f8a..b77b3ebe3c 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -1,6 +1,6 @@
 # Data-driven tests
 
-Status: **Proposed**
+Status: **Accepted**
 
 <details>
 	<summary>Metadata</summary>
@@ -20,64 +20,55 @@ One of the [deliverables of the Message Format Working Group (MFWG)](https://git
 
 > "A conformance test suite for parsing and formatting messages sufficient to ensure implementations can validate conformance to the specification(s) provided".
 
-This design proposal captures the planned approach for the suite.
+This design proposal captures the planned approach for the suite:
 
-This approach includes _how_ tests are written: They should be captured in a single platform-agnostic format that can be utilized by all MF2 implementations. There should be no need to rewrite individual test cases for each platform.
+- It captures _what_ kind of tests are written by identifying the aspects of the MessageFormat 2 (MF2) specification that must be tested and the categories of test that do this.
 
-This approach also includes _what_ kind of tests are written. We need to identify which parts of MF2 should be covered by different types of test as a minimum.
+- It also captures _how_ tests are written by describing the single platform-agnostic format that can be utilized by any MF2 test runner.
 
 ## Background
 
-Several pre-existing test files have been considered before forming this proposal:
+Several pre-existing test suites have been considered before forming this proposal:
 
 - [**Unicode's Data Driven Test framework**](https://github.com/unicode-org/conformance) is a project with a goal that aligns with that of MFWG's conformance test suite.
 
 - [**message-format-wg XML test format**](https://github.com/unicode-org/message-format-wg/tree/514758923abac13a2c5eb71b6b6cdef4a181280e/test) includes a test schema and accompanying test examples from which we can take inspiration.
 
-- [**Intl.MessageFormat polyfill tests**](https://github.com/messageformat/messageformat/tree/main/packages/mf2-messageformat/src) are implementation-specific but they capture the type of tests that we may want to include in the conformance test suite. The polyfill itself is an implementation that the test suite could be run against.
+- [**Intl.MessageFormat polyfill tests**](https://github.com/messageformat/messageformat/blob/ee1bc08826f0855d00a9ace4db001c06a8679983/packages/mf2-messageformat/src/messageformat.test.ts) are implementation-specific but they capture the type of tests that we may want to include in the conformance test suite. The polyfill itself is an implementation that the test suite could be run against.
 
-- [**ICU**](https://github.com/unicode-org/icu) also contains platform-specific MF2 test cases that could be reused for the conformance test suite, including the [ICU4J tests](https://github.com/unicode-org/icu/tree/main/icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2) and [Tim Chevalier's draft ICU4C tests](https://github.com/catamorphism/icu/blob/parser-plus-data-model-plus-full-api/icu4c/source/test/intltest/messageformat2test.cpp).
+- [**ICU**](https://github.com/unicode-org/icu) also contains platform-specific MF2 test cases that could be reused for the conformance test suite, including the [ICU4J tests](https://github.com/unicode-org/icu/blob/4f75c627675b426938f569003ee9dc0ea43490bb/icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/MessageFormat2Test.java) and [ICU4C tests](https://github.com/unicode-org/icu/blob/6d5555a739179b5d177e73db7c111c5ef1cac22d/icu4c/source/test/intltest/messageformat2test.cpp).
 
 ## Use-Cases
 
-**Developers** of MF2 implementations need to easily verify that their completed implementation conforms to the specification. This needs to be fully automated and easily repeatable.
+**Developers** of MF2 implementations need to easily verify that their completed implementation conforms to the specification. This needs to be fully automated and easily repeatable. For incomplete and incorrect implementations, it is important for developers to easily understand where the specification is not being met and why.
 
-For incomplete and incorrect implementations, it is important for developers to easily understand where the specification is not being met and why.
+**Stakeholders** and **MF2 users** may use the tests as human-readable documentation of the specification. They need to be easily navigable and legible for this purpose.
 
-The main platforms for which the tests should initially run are:
-
-- Node.js
-- ICU4J (Java)
-- ICU4C (C++)
+**Vendors** using tooling that conforms to the specification may want to run tests against it to verify that this is the case.
 
-Other platforms, such as ICU4X (Rust) may be added later.
+## Requirements
 
-**Stakeholders** and **MF2 users** may use the conformance test suite as human-readable documentation of the specification. It needs to be easily navigable and legible for this purpose.
+### Test the specification, not necessarily the final output
 
-**Vendors** using tooling that conforms to the specification may want to run tests against it to verify that this is the case.
+Every piece of the specification should be testable. In order to test the specification in isolation, the test suite should be independent of Unicode CLDR locale data.
 
-## Requirements
+### Provide tests, not runners
 
-- Test framework
+Unlike the test suites within [ICU](https://github.com/unicode-org/icu), this suite does not target a specific implementation and is not tied to any particular executor. It is completely platform-agnostic. Consumers of the tests can decide how they are run.
 
-  - The test cases and assertions must be easy to read.
-  - The test cases and assertions must be completely platform-agnostic.
-  - The framework must include the platform-specific test executors as part of the solution.
-  - The framework must be extendable with new executors (e.g. ICU4X) and it should be clear how to do this.
+### Use a versatile format
 
-- Test content
-  - **Syntax tests:** These test that valid patterns are evaluated correctly and that invalid patterns are identified. Where standard registry functions are used, they also test that the correct function is invoked with the expected arguments.
-  - **Selector tests:** These test that the correct case of a `match` statement is selected, based on what follows the `when` keyword.
+The tests should be captured in a format that is highly portable and easily integrable with a wide range of technologies. The format should be easy to read while being flexible enough to capture the necessary detail of all test input and output.
 
 ## Constraints
 
-### External dependencies can impact portability
+### Function output can differ between implementations
 
-The platform-agnostic nature of the tests means that great caution must be taken around adding dependencies. The test suite must cater for a range of technology stacks and workflows with different restrictions around external dependencies.
+The behaviour of default registry functions such as `:number` and `:datetime` is dependent on locale-specific data and may vary between implementations. [Test functions](https://github.com/unicode-org/message-format-wg/blob/6414b6c7d9faed6c1b4645b92b3548a8ea0ad332/test/README.md) should be used to write more isolated tests.
 
 ### Errors and evaluation strategy may not be consistent
 
-It is important to test error cases for each of the test types mentioned above but, because variable evaluation is not captured within the standard, we cannot guarantee what kind of error will be raised in all cases.
+Variable evaluation is not captured within the standard so we cannot guarantee the order in which errors are encountered.
 
 For example, the pattern below may or may not result in an error depending on how lazily the expression is evaluated. This presents a challenge for testing.
 
@@ -93,159 +84,68 @@ local $foo = {$bar}
 {Hello, {$bar}!}
 ```
 
-### The output of formatters may not be stable over time
-
-Where possible, any parts of the suite that do not directly test the formatters should be independent of their output. This is to reduce the number of test failures caused by formatter output changes.
+For this reason, error tests should capture all errors present in each test case.
 
 ### Data model is not part of the specification
 
-Although a standard data model is included in this repository, there is no requirement for all MF2 implementations to use it. This means that any data model tests included in the test suite may fail for otherwise standard-compliant implementations. If any tests of this type are included, they must be optional.
+Although a standard data model is included in this repository, there is no requirement for all MF2 implementations to use it. Tests that rely on the structure of this data model may fail for standard-compliant implementations. If any tests of this type are included, they must be treated as optional.
 
 ## Proposed Design
 
-### Test framework
-
-The MF2 test framework should follow the ['Unicode & CLDR Data Driven Test'](https://github.com/unicode-org/conformance) framework.
-
-As per the project's [README.md](https://github.com/unicode-org/conformance#readme):
-
-> "The goal of this work is an easy-to-use framework for verifying that an implementation of ICU functions agrees with the required behavior. When a DDT test passes, it a strong indication that output is consistent across platforms. [...] Data Driven Test (DDT) focuses on functions that accept data input such as numbers, date/time data, and other basic information."
-
-This aligns closely with the goals and characteristics of the MF2 tests. Parity with ICU procedures is an added advantage.
-
-The README specifies that test cases and expected results are to be located in separate files (including the rationale for this).
-
-#### Test file example
-
-`example_1_test.json`
-
-```jsonc
-{
-  "Test scenario": "example_1",
-  "description": "Test cases for XYZ",
-  "testType": "syntax", // Tests will require different setup steps or function calls depending on their purpose.
-  "tests": [
-    {
-      "label": "0000",
-      "locale": "en-US",
-      "pattern": "{Some MF2 pattern}",
-      "options": {}, // Optional configuration
-      "input": { "namedArg": "foo" } // Arguments to the function being tested, such as a message.formatToString() function. May vary with testType.
-    }
-    // ...
-  ]
-}
-```
-
-#### Verification file example
-
-`example_1_verify.json`
-
-```jsonc
-{
-  "Test scenario": "example_1",
-  "verifications": [
-    {
-      "label": "0000",
-      "verify": "Expected result"
-    }
-    // ...
-  ]
-}
-```
-
 ### Test format
 
-As per the 'Unicode & CLDR Data Driven Test' documentation, test and verification files are provided in JSON format. The proposal is to write tests in YAML and transpile them to JSON.
+Tests should be written in __JSON__. This format aligns with the requirements above around versatility, as well as providing a favorable editing experience. It offers:
 
-JSON does not support multiline strings so test files may need to include `\n` line breaks in order to capture multiline patterns, which may impact readability. This is the main reason not to author tests in JSON directly. Assuming both the source and JSON-format tests are committed to the repository, the JSON remains the single source of truth for the tests and it can be consumed by the test executor without the need for any transpilation at runtime.
-
-The source format should offer the following:
-
-- Precise control over whitespace as many MF2 tests concern this.
-- Literal newlines for use in multiline patterns.
+- Precise control over whitespace - tests are needed around whitespace handling.
 - Concise readable syntax.
-- Comment syntax.
 - Validation against a schema.
-- (Optional) Editor integration for syntax highlighting and validation.
+- Editor integration for syntax highlighting and validation.
 
-YAML fulfils these requirements and is widely used.
+Other considerations around using JSON:
 
-There is a [test generator](https://github.com/unicode-org/conformance/tree/main/testgen) included in the 'Unicode & CLDR Data Driven Test' repository. At the time of writing, this is specific to number format tests and is not easily adaptable to the needs of MF2. It does, however, demonstrate generating JSON from source files.
+- It does not support multiline strings. Test files may need to include `\n` line breaks in order to capture multiline patterns, which may impact readability.
+- It does not include a syntax for comments. The test schema should include an explicit field to capture test descriptions.
 
-### Test content
 
-#### Syntax tests
+### Test schema
 
-These tests evaluate the pattern based on the runtime arguments. Formatters are shown as stringified representations of the function because formatter output is tested separately.
+__JSON Schema__ should be used to capture the structure of test files. `"$comment"` properties can be used within the schema for any additional documentation required.
 
-Example:
+The proposed schema is included under [test/schemas/v0/](https://github.com/unicode-org/message-format-wg/tree/b4fd5a666a02950c57f0a454f65bf16a0bf03bf4/test/schemas/v0). Its version can be incremented to v1 when the proposal is accepted.
 
-```jsonc
-{
-  "label": "Renders multiple inputs in formatted string",
-  "locale": "en-US",
-  "pattern": "{{$strArg :string} and {$numArg :number minimumFractionDigits=2}}",
-  "inputs": {
-    "strArg": { "type": "string", "value": "foo" },
-    "numArg": { "type": "number", "value": 123 }
-  }
-  // "verify":  "{ formatter: "string", value: "foo" } and { formatter: number, value: 123, minimumFractionDigits: 2 }"
-}
-```
+It is important that the schema is versioned. The version number should be captured within the schema files themselves because these files may be copied and used out of the context of this repository. By using a __version directory__ and __$id property__ for the schema, we can bump a schema version by changing one directory name and updating the `$id` property in the schema file(s) to match.
 
-#### Selector tests
+Although the use of [semantic versioning](https://semver.org/) has been discussed, it is likely to be overkill for our purposes.
 
-These are extensive tests of the cases within a `match` statement. Testing of multiple selectors is included.
+In order to reduce the verbosity of test files that contain multiple similar tests, the MF2 schema should include a `defaultTestProperties` property. This is an object that specifies properties to be used for every test case in the file (unless overridden in individual tests).
 
-Single selector example:
+Default properties can be used for expected outputs as well as inputs. For example:
 
 ```jsonc
-{
-  "label": "Matches numbers other than one",
-  "locale": "en-US",
-  "pattern": "match {$arg :number} when 1 {result 1} when * {result multi}",
-  "inputs": {
-    "arg": { "type": "number", "value": 2 }
-  }
-  // "verify": "result multi"
-}
+// The given locale for every test case is "en-US".
+"defaultTestProperties": { "locale": "en-US" }
+
+// The expected string output for every test case is "Hello"
+// and no test cases result in an error.
+"defaultTestProperties": { "exp": "Hello", "expErrors": false }
 ```
 
-Multiple selector example:
+This default property implies that
 
-```jsonc
-{
-  "label": "Matches wildcard strings and numbers other than one",
-  "locale": "en-US",
-  "pattern": "match {$name :string} {$count :number} when apple 1 {result apple 1} when apple * {result apple multi} when * 1 {result other 1} when * * {result other multi}",
-  "inputs": {
-    "name": { "type": "string", "value": "banana" },
-    "count": { "type": "number", "value": 3 }
-  }
-  // "verify": "result other multi"
-}
-```
+### Test content
 
-#### Formatter tests (optional)
+#### Syntax tests
 
-These tests focus on the standard registry's formatters (e.g. `:number`, `:datetime`). They cover the different options that can be passed to each formatter (e.g. `offset`, `skeleton`).
+These tests evaluate the pattern based on the runtime arguments. Functions are shown as stringified representations and are tested separately.
 
-If the output of a formatter changes in the future, these tests may need updating.
+#### Function tests
 
-Example:
+There are two types of function test:
 
-```jsonc
-{
-  "label": "Skeleton affects datetime format",
-  "locale": "en-US",
-  "pattern": "{$givenDateTime :datetime skeleton=yMMMdE}",
-  "inputs": {
-    "givenDateTime": { "type": "datetime", "value": "2000-12-31T00:00:00.000Z" }
-  }
-  // "verify":  "Sun, 31 Dec 2000"
-}
-```
+- __Selector tests__ test the cases within a `match` statement. Testing of multiple selectors is included.
+- __Formatter tests__  test the standard registry's formatters (e.g. `:number`, `:datetime`). They cover the different options that can be passed to each formatter (e.g. `offset`, `skeleton`).
+
+As mentioned above, the behaviour of some of the default registry functions such as `:number` and `:datetime` is dependent on locale-specific data and may vary between implementations. There are special functions designed for test use only, which include `:test:select` and `:test:format` for replacing selectors and formatters respectively in the syntax tests. More information on these test functions can be found [here](https://github.com/unicode-org/message-format-wg/blob/6414b6c7d9faed6c1b4645b92b3548a8ea0ad332/test/README.md#test-functions).
 
 #### Data model tests (optional)
 
@@ -255,47 +155,23 @@ If a particular implementation of MF2 exposes a standardized representation of [
 
 ## Alternatives Considered
 
+### YAML test syntax
+
+YAML has some advantages over JSON:
+
+- It is extremely readable.
+- It supports multiline strings.
+- It supports comments.
+
+However, the flexibility of the syntax means that there is a risk of introducing ambiguity into the test cases. This makes it unsuitable.
+
+
 ### XML test syntax
 
-As mentioned above, there are several advantages to writing tests in XML:
+There are several advantages to writing tests in XML:
 
 - It allows preservation of whitespace in strings, which is crucial for MF2 test cases.
 - It allows literal newline characters in strings, which provides enhanced readability for multiline patterns.
 - It supports a schema format, which can be used to validate test files.
-- It is widely supported.
-
-XML is fairly verbose though. It is better suited to writing markup, which is not our use-case.
-
-### Gherkin test syntax and Cucumber runner
-
-Based on the readability concerns mentioned above, the Gherkin syntax was also considered.
-
-Example:
-
-```feature
-Feature: Multi-selector messages
-
-  Background:
-    Given the username is "Matt"
-    And the source is:
-      """
-      match {$photoCount :number} {$userGender :equals}
-      when 1 masculine {{$userName} added a new photo to his album.}
-      when 1 feminine  {{$userName} added a new photo to her album.}
-      when 1 *         {{$userName} added a new photo to their album.}
-      when * masculine {{$userName} added {$photoCount} photos to his album.}
-      when * feminine  {{$userName} added {$photoCount} photos to her album.}
-      when * *         {{$userName} added {$photoCount} photos to their album.}
-      """
-
-  Scenario: One item - male
-    When the message is resolved with params:
-      | key        | value     |
-      | photoCount |         1 |
-      | userGender | masculine |
-    Then the string output is "Matt added a new photo to his album."
-```
-
-The [Cucumber framework](https://cucumber.io/) was considered because of its integration with the Gherkin syntax. Cucumber's approach of using platform-specific step definitions for Gherkin scenarios aligns with our goal of having a data-only representation of the test content. It may, however, be difficult to support Cucumber in certain technology stacks and workflows.
 
-It would be possible to transpile Gherkin to JSON without using Cucumber, which would provide similar benefits to the YAML transpilation mentioned above. This can be discussed further.
+XML is fairly verbose though. It is better suited to writing markup.

From 3d7a67fb4a465653e42769a341fe5e01c7a92ac4 Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Tue, 11 Jun 2024 13:07:44 +0100
Subject: [PATCH 2/6] Changes to 'test content' section

---
 exploration/data-driven-tests.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index b77b3ebe3c..c9907088a4 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -136,7 +136,7 @@ This default property implies that
 
 #### Syntax tests
 
-These tests evaluate the pattern based on the runtime arguments. Functions are shown as stringified representations and are tested separately.
+These tests evaluate the pattern `src` using the given runtime `params`. Assertions are made on the output, which can be formatted as either a single string or parts, and any resulting errors. Syntax tests are the core of the test suite.
 
 #### Function tests
 
@@ -151,7 +151,7 @@ As mentioned above, the behaviour of some of the default registry functions such
 
 There is no standard data model within the specification, which means that we cannot create mandatory data model tests.
 
-If a particular implementation of MF2 exposes a standardized representation of [the data model](../spec/data-model/message.json), perhaps through a `mf2.toCanonicalJson();` function or similar, then we could create tests that assert against this.
+If a particular implementation of MF2 exposes a standardized representation of [the data model](../spec/data-model/message.json), perhaps through a `mf2.toCanonicalJson();` function or similar, then we could create tests that assert against this in future.
 
 ## Alternatives Considered
 

From 057d8594795683e79093c3ab0707477b0164ef03 Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Wed, 12 Jun 2024 17:31:47 +0100
Subject: [PATCH 3/6] Update exploration/data-driven-tests.md

Co-authored-by: Eemeli Aro <eemeli@gmail.com>
---
 exploration/data-driven-tests.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index c9907088a4..5af8ebfba3 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -24,7 +24,7 @@ This design proposal captures the planned approach for the suite:
 
 - It captures _what_ kind of tests are written by identifying the aspects of the MessageFormat 2 (MF2) specification that must be tested and the categories of test that do this.
 
-- It also captures _how_ tests are written by describing the single platform-agnostic format that can be utilized by any MF2 test runner.
+- It also captures _how_ tests are written by describing the single platform-agnostic format that can be used by any MF2 test runner.
 
 ## Background
 

From 877b04f8ffb44a1a4a39a2d9101eb0eae3d10c83 Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Wed, 12 Jun 2024 17:33:26 +0100
Subject: [PATCH 4/6] Apply suggestions from code review

Co-authored-by: Eemeli Aro <eemeli@gmail.com>
---
 exploration/data-driven-tests.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index 5af8ebfba3..f9c628928c 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -94,7 +94,7 @@ Although a standard data model is included in this repository, there is no requi
 
 ### Test format
 
-Tests should be written in __JSON__. This format aligns with the requirements above around versatility, as well as providing a favorable editing experience. It offers:
+Tests should be written in JSON. This format aligns with the requirements above around versatility, as well as providing a favorable editing experience. It offers:
 
 - Precise control over whitespace - tests are needed around whitespace handling.
 - Concise readable syntax.

From 99922bc92acf83f59dfc44718c031cb4162d4117 Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Wed, 12 Jun 2024 20:08:51 +0100
Subject: [PATCH 5/6] Delete sentence fragment

---
 exploration/data-driven-tests.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index f9c628928c..51bac509bb 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -130,8 +130,6 @@ Default properties can be used for expected outputs as well as inputs. For examp
 "defaultTestProperties": { "exp": "Hello", "expErrors": false }
 ```
 
-This default property implies that
-
 ### Test content
 
 #### Syntax tests

From 602bca2cfb7ce994dac2fdee2faa16e17e75aaab Mon Sep 17 00:00:00 2001
From: Matt Radbourne <1254508+mradbourne@users.noreply.github.com>
Date: Wed, 12 Jun 2024 20:18:03 +0100
Subject: [PATCH 6/6] Remove emphasis (consistent with PR suggestion)

---
 exploration/data-driven-tests.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exploration/data-driven-tests.md b/exploration/data-driven-tests.md
index 51bac509bb..13f2ee84f2 100644
--- a/exploration/data-driven-tests.md
+++ b/exploration/data-driven-tests.md
@@ -109,11 +109,11 @@ Other considerations around using JSON:
 
 ### Test schema
 
-__JSON Schema__ should be used to capture the structure of test files. `"$comment"` properties can be used within the schema for any additional documentation required.
+JSON Schema should be used to capture the structure of test files. `"$comment"` properties can be used within the schema for any additional documentation required.
 
 The proposed schema is included under [test/schemas/v0/](https://github.com/unicode-org/message-format-wg/tree/b4fd5a666a02950c57f0a454f65bf16a0bf03bf4/test/schemas/v0). Its version can be incremented to v1 when the proposal is accepted.
 
-It is important that the schema is versioned. The version number should be captured within the schema files themselves because these files may be copied and used out of the context of this repository. By using a __version directory__ and __$id property__ for the schema, we can bump a schema version by changing one directory name and updating the `$id` property in the schema file(s) to match.
+It is important that the schema is versioned. The version number should be captured within the schema files themselves because these files may be copied and used out of the context of this repository. By using a version directory and $id property for the schema, we can bump a schema version by changing one directory name and updating the `$id` property in the schema file(s) to match.
 
 Although the use of [semantic versioning](https://semver.org/) has been discussed, it is likely to be overkill for our purposes.