Skip to content

Commit 94098ca

Browse files
committed
Merge branch 'main' into define-resolved-values
2 parents 17a9830 + ffb69ac commit 94098ca

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+8567
-1064
lines changed
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Validate test data
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- test/**
9+
pull_request:
10+
branches: '**'
11+
paths:
12+
- test/**
13+
14+
jobs:
15+
run_all:
16+
name: Validate tests using schema
17+
runs-on: ubuntu-latest
18+
steps:
19+
- name: Checkout repo
20+
uses: actions/checkout@v4
21+
- name: Install CLI tool for JSON Schema validation
22+
run: npm install --global ajv-cli
23+
- name: Validate tests using the latest schema version
24+
run: >
25+
ajv validate --spec=draft2020
26+
-s $(ls -1v schemas/*/*schema.json | tail -1)
27+
-d 'tests/**/*.json'
28+
working-directory: ./test

LICENSE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,5 @@ Except as contained in this notice, the name of a copyright holder shall
3737
not be used in advertising or otherwise to promote the sale, use or other
3838
dealings in these Data Files or Software without prior written
3939
authorization of the copyright holder.
40+
41+
SPDX-License-Identifier: Unicode-3.0

README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ user interfaces that can appear in any language and support the needs of diverse
5454

5555
The current specification starts [here](spec/README.md) and may have changed since the publication
5656
of the Tech Preview version.
57-
The Tech Preview specification is [here](tr35-messageformat.md) (link to follow).
57+
The Tech Preview specification is [here](https://www.unicode.org/reports/tr35/tr35-72/tr35-messageFormat.html)
5858

5959
The current draft syntax for defining messages can be found in [spec/syntax.md](./spec/syntax.md).
6060
The syntax is formally described in [ABNF](spec/message.abnf).
@@ -102,16 +102,32 @@ The Working Group continues to address feedback
102102
and develop portions of the specification not completed for the LDML45 Tech Preview release.
103103
The `main` branch of this repository contains changes implemented since the technical preview.
104104

105-
Implementers should be aware of the following normative changes during the tech preview period:
106-
- _(list to be updated during tech preview)_
105+
Implementers should be aware of the following normative changes during the tech preview period.
106+
See the [commit history](https://github.com/unicode-org/message-format-wg/commits)
107+
after 2024-04-13 for a list of all commits (including non-normative changes).
108+
- [#771](https://github.com/unicode-org/message-format-wg/issues/771) Remove inappropriate normative statement from errors.md
109+
- [#767](https://github.com/unicode-org/message-format-wg/issues/767) Add a test schema and
110+
[#778](https://github.com/unicode-org/message-format-wg/issues/778) validate tests against it
111+
- [#775](https://github.com/unicode-org/message-format-wg/issues/775) Add a definition for `variable`
112+
- [#774](https://github.com/unicode-org/message-format-wg/issues/774) Refactor error types, adding a _Message Function Error_ type (and subtypes)
113+
- [#769](https://github.com/unicode-org/message-format-wg/issues/769) Add `:test:function`,
114+
`:test:select` and `:test:format` functions for implementation testing
115+
- [#743](https://github.com/unicode-org/message-format-wg/issues/743) Collapse all escape sequence rules into one (affects the ABNF)
116+
- _more to be added as they are merged_
107117

108118
## Implementations
109119

110-
(The working group expects that ICU75 will include both Java and C/C++ implementations of the tech preview specification)
111-
112-
- Java: [`com.ibm.icu.message2`](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/index.html?com/ibm/icu/message2/package-summary.html), part of ICU 72 released in October 2022, is a _tech preview_ implementation of the MessageFormat 2 syntax, together with a formatting API. See the [ICU User Guide](https://unicode-org.github.io/icu/userguide/format_parse/messages/mf2.html) for examples and a quickstart guide.
120+
- Java: [`com.ibm.icu.message2`](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/index.html?com/ibm/icu/message2/package-summary.html), part of ICU 75, is a _tech preview_ implementation of the MessageFormat 2 syntax, together with a formatting API. See the [ICU User Guide](https://unicode-org.github.io/icu/userguide/format_parse/messages/mf2.html) for examples and a quickstart guide.
121+
- C/C++: [`icu::message2::MessageFormatter`](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1message2_1_1MessageFormatter.html), part of ICU 75, is a _tech preview_ implementation of MessageFormat 2.
113122
- JavaScript: [`messageformat`](https://github.com/messageformat/messageformat/tree/master/packages/mf2-messageformat) 4.0 implements the MessageFormat 2 syntax, together with a polyfill of the runtime API proposed for ECMA-402.
114123

124+
The working group is also aware of these implementations in progress or released, but has not evaluated them:
125+
- [i18next](https://www.npmjs.com/package/i18next-mf2) i18nFormat plugin to use mf2 format with i18next, version 0.1.1
126+
127+
> [!NOTE]
128+
> Tell us about your MessageFormat 2 implementation!
129+
> Submit a [PR on this page](https://github.com/unicode-org/message-format-wg/edit/main/README.md), file an issue, or send email to have your implementation appear here.
130+
115131
## Sharing Feedback
116132

117133
Technical Preview Feedback: [file an issue here](https://github.com/unicode-org/message-format-wg/issues/new?labels=Preview-Feedback&projects=&template=tech-preview-feedback.md&title=%5BFEEDBACK%5D+)

docs/tech-preview-blog-post.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Blog Post for Technical Preview
2+
3+
Today, Unicode announced the Technical Preview of MessageFormat 2,
4+
a new standard for creating and managing user interface strings.
5+
These messages can dynamically include data values formatted
6+
(using information in the Common Locale Data Repository [CLDR])
7+
according to the needs of the language and culture of the end user.
8+
Such messages can be adjusted to meet the linguistic needs of each
9+
language and are designed to be translated easily and efficiently.
10+
11+
Previously, software developers had to choose between many different
12+
APIs and templating languages to build user interface strings.
13+
These solutions did not always provide for the features of different
14+
human languages. Support was limited to specific platforms
15+
and these formats were not widely supported by translation tools,
16+
making translation and adaptation to specific cultures costly
17+
and time consuming.
18+
Most significantly, message formatting was limited to a small
19+
number of built-in formats.
20+
21+
One of the challenges in adapting software to work for
22+
users with different languages and cultures is the need for **_dynamic messages_**.
23+
Whenever a user interface needs to present data as part of a larger message,
24+
that data needs to be formatted.
25+
In many languages, including English, the message itself needs to be altered
26+
to make it grammatically correct.
27+
28+
For example, if a message in English might read:
29+
30+
> Your item had **1,023** views on **April 8, 2024**.
31+
32+
The equivalent message in French might read:
33+
34+
> Votre article a eu **1 023** vues le **8 avril 2024**.
35+
36+
Or Japanese:
37+
38+
> あなたのアイテムは **2024 年 4 月 8 日****1,023** 回閲覧されました。
39+
40+
But even in English, there are grammatical variations required:
41+
42+
> Your item had _no views_...
43+
>
44+
> Your item had 1 _view_...
45+
>
46+
> Your item had 1,043 _views_...
47+
48+
Once messages have been created, they need to be translated into the various
49+
languages and adapted for the various cultures around the world.
50+
Previously, there was no widely adopted standard,
51+
and existing formats provided only rudimentary support for managing
52+
the variations needed by other languages.
53+
Thus, it could be difficult for translators to do their work effectively.
54+
55+
For example, the same message shown above needs a different set of variations
56+
in order to support Polish:
57+
58+
> Twój przedmiot nie _ma_ żadnych _wyświetleń_.
59+
>
60+
> Twój przedmiot _miał_ 1 _wyświetlenie_.
61+
>
62+
> Twój przedmiot _miał_ 2 _wyświetlenia_.
63+
>
64+
> Twój przedmiot _ma_ 5 _wyświetleń_.
65+
66+
67+
MessageFormat 2 makes it easy to write messages like this
68+
without developers needing to know about such language variation.
69+
In fact, developers don't need to learn about any of the language
70+
and formatting variations needed by languages other than their own
71+
nor write code that manipulates formatting.
72+
73+
MessageFormat 2 messages can be simple strings:
74+
```
75+
Hello, world!
76+
```
77+
78+
A message can also include _placeholders_ that are replaced by user-provided values:
79+
```
80+
Hello {$user}!
81+
```
82+
83+
The user-provided values can be transformed or formatted using functions:
84+
```
85+
Today is {$date :date}
86+
Today is {$date :datetime weekday=long}.
87+
```
88+
89+
Messages can use a function (called a _selector_) to choose between
90+
different versions of a message.
91+
These allow messages to be tailored to the grammatical (or other) requirements of
92+
a given language:
93+
```
94+
.match {$count :integer}
95+
0 {{You have no views.}}
96+
one {{You have {$count} view.}}
97+
* {{You have {$count} views.}}
98+
```
99+
100+
Unlike the previous version of MessageFormat, MessageFormat 2 is designed for
101+
extension by implementers and even end users.
102+
This means that new functionality can be added to messages without modifying
103+
either existing messages or, in some cases, even the core library containing the
104+
MessageFormat 2 code.
105+
106+
MessageFormat 2 provides a rich and extensible set of functionality
107+
to permit the creation of natural-sounding, grammatically-correct,
108+
messages, while enabling rapid, accurate translation
109+
and extension using new and improved internationalization functionality
110+
in any computing system.
111+
112+
The Technical Preview is available for comment.
113+
The stable version of this specification is expected to be part of the
114+
Fall 2024 release of CLDR (v46).
115+
Implementations are available in ICU4J (Java) and ICU4C (C/C++)
116+
as well as JavaScript.
117+
Feedback about implementation experience,
118+
syntax,
119+
functionality,
120+
or other parts of the specification is welcome!
121+
See the end of this article for details on participation and how to comment on this work.
122+
123+
MessageFormat 2 consists of multiple parts:
124+
a syntax, including a formal grammar, for writing messages;
125+
a data model for representing messages (including those ported from other APIs);
126+
a registry of required functions;
127+
a function description mechanism for use by implementations and tools;
128+
and a test suite.

docs/tools/linkify.js

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
// Work in progress: tooling to linkify the HTML produced from
2+
// the MessageFormat 2 markdown.
3+
// this has been tested on the tr35-messageformat.html file
4+
// but not implemented in LDML45
5+
function linkify() {
6+
const terms = findTerms();
7+
const missing = new Set();
8+
const links = document.querySelectorAll("em");
9+
links.forEach((item) => {
10+
const target = generateId(item.textContent);
11+
if (terms.has(target)) {
12+
const el = item.lastElementChild ?? item;
13+
el.innerHTML = `<a href="#${target}">${item.textContent}</a>`;
14+
} else {
15+
missing.add(target);
16+
}
17+
});
18+
// report missing terms
19+
// (leave out sort if you want it in file order)
20+
Array.from(missing).sort().forEach((item)=> {
21+
console.log(item);
22+
});
23+
}
24+
25+
function findTerms() {
26+
const terms = new Set();
27+
document.querySelectorAll("dfn").forEach((item) => {
28+
// console.log(index + ": " + item.textContent);
29+
const term = generateId(item.textContent);
30+
// guard against duplicates
31+
if (terms.has(term)) {
32+
console.log("Duplicate term: " + term);
33+
}
34+
terms.add(term);
35+
item.setAttribute("id", term);
36+
});
37+
return terms;
38+
}
39+
40+
function generateId(term) {
41+
const id = term.toLowerCase().replaceAll(" ", "-");
42+
if (id.endsWith("rategies")) {
43+
// found in the bidi isolation strategies
44+
return id.slice(0, -3) + "y";
45+
} else if (id.endsWith("s") && id !== "status") {
46+
// regular English plurals
47+
return id.slice(0, -1);
48+
}
49+
return id;
50+
}

docs/why_mf_next.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,15 @@ why MessageFormat is important and why MessageFormat 2.0 is needed.
77

88
## Intro
99

10-
The `MessageFormat` API and syntax have been around for a long time.
11-
12-
Intro
10+
The `MessageFormat` API and syntax have been around for a long time:
1311

1412
- `MessageFormat` is the Unicode API for software localization
15-
- It is 20 years old, well designed, proven solution
16-
Its design was optimized for the software development model
17-
of twenty years ago.
18-
Implementers, developers, and translators struggle with its shortcomings.
13+
- It is 20 years old and is a well-designed, proven solution
1914

20-
The current wave of software development uses dynamic languages, modern UI
21-
frameworks and new forms of user interactions (voice, VR etc.).
15+
However, its design was optimized for the software development model of twenty
16+
years ago. Implementers, developers, and translators struggle with its
17+
shortcomings. The current wave of software development uses dynamic languages,
18+
modern UI frameworks, and new forms of user interactions (voice, VR etc.).
2219

2320
Considering these new challenges, combined with the lessons learned from using
2421
`MessageFormat`, we aim to design the next iteration of `MessageFormat`

0 commit comments

Comments
 (0)