Skip to content

Commit eb4ad6d

Browse files
committed
Factor out "get language availabilities"
This will be used by the language detector spec.
1 parent 0eac14b commit eb4ad6d

File tree

1 file changed

+71
-66
lines changed

1 file changed

+71
-66
lines changed

index.bs

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,7 @@ A <dfn>language availabilities</dfn> is a [=struct=] with the following [=struct
362362
* <dfn for="language availabilities">context languages</dfn>
363363
* <dfn for="language availabilities">output languages</dfn>
364364

365-
All of these [=struct/items=] are [=maps=] from {{AIAvailability}} values to [=sets=] of strings representing [=Unicode canonicalized locale identifiers=], initially empty maps. Their [=map/keys=] will always be one of "{{AIAvailability/downloading}}", "{{AIAvailability/downloadable}}", or "{{AIAvailability/available}}" (i.e., they will never be "{{AIAvailability/unavailable}}"). [[!ECMA-402]]
365+
All of these [=struct/items=] are [=maps=] from {{AIAvailability}} values to [=sets=] of strings representing [=Unicode canonicalized locale identifiers=]. Their [=map/keys=] will always be one of "{{AIAvailability/downloading}}", "{{AIAvailability/downloadable}}", or "{{AIAvailability/available}}" (i.e., they will never be "{{AIAvailability/unavailable}}"). [[!ECMA-402]]
366366

367367
<div algorithm>
368368
The <dfn>summarizer language availabilities</dfn> are given by the following steps. They return a [=language availabilities=] or null.
@@ -371,82 +371,45 @@ All of these [=struct/items=] are [=maps=] from {{AIAvailability}} values to [=s
371371

372372
1. If there is some error attempting to determine whether the user agent supports summarizing text, which the user agent believes to be transient (such that re-querying the [=summarizer language availabilities=] could stop producing such an error), then return null.
373373

374-
1. Let |availabilities| be a [=language availabilities=].
374+
1. Return a [=language availabilities=] with:
375375

376-
1. [=Fill language availabilities=] given |availabilities|'s [=language availabilities/input languages=] and the purpose of summarizing text written in that language.
377-
378-
1. [=Fill language availabilities=] given |availabilities|'s [=language availabilities/context languages=] and the purpose of summarizing text using web-developer provided context information written in that language.
379-
380-
1. [=Fill language availabilities=] given |availabilities|'s [=language availabilities/output languages=] and the purpose of producing text summaries in that language.
381-
382-
1. Return |availabilities|.
383-
</div>
384-
385-
<div algorithm>
386-
To <dfn>fill language availabilities</dfn> given a [=map=] |languagesMap| and a description of the purpose for which we're checking language availability:
387-
388-
1. Set |languagesMap|["{{AIAvailability/available}}"], |languagesMap|["{{AIAvailability/downloading}}"], and |languagesMap|["{{AIAvailability/downloadable}}"] to empty [=sets=].
389-
390-
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent supports |purpose|, without performing any downloading operations:
391-
392-
1. [=set/Append=] |languageTag| to |languagesMap|["{{AIAvailability/available}}"].
393-
394-
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent is currently downloading material (e.g., an AI model or fine-tuning) to support |purpose|:
395-
396-
1. [=set/Append=] |languageTag| to |languagesMap|["{{AIAvailability/downloading}}"].
397-
398-
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent believes it can support |purpose|, but only after performing a not-currently-ongoing download (e.g., of an AI model or fine-tuning):
399-
400-
1. [=set/Append=] |languageTag| to |languagesMap|["{{AIAvailability/downloadable}}"].
401-
402-
1. [=Assert=]: |languagesMap|["{{AIAvailability/available}}"], |languagesMap|["{{AIAvailability/downloading}}"], and |languagesMap|["{{AIAvailability/downloadable}}"] are disjoint.
403-
404-
1. If the [=set/union=] of |languagesMap|["{{AIAvailability/available}}"], |languagesMap|["{{AIAvailability/downloading}}"], and |languagesMap|["{{AIAvailability/downloadable}}"] does not meet the [=language tag set completeness rules=], then:
376+
<dl class="props">
377+
: [=language availabilities/input languages=]
378+
:: the result of [=getting language availabilities=] given the purpose of summarizing text written in that language
405379

406-
1. Let |missingLanguageTags| be the [=set=] of missing language tags necessary to meet the [=language tag set completeness rules=].
380+
: [=language availabilities/context languages=]
381+
:: the result of [=getting language availabilities=] given the purpose of summarizing text using web-developer provided context information written in that language
407382

408-
1. [=set/For each=] |languageTag| of |missingLanguageTags|:
409-
410-
1. <span id="language-tag-completeness-implementation-defined"></span> [=set/Append=] |languageTag| to one of the three sets. Which of the sets to append to is [=implementation-defined=], and should be guided by considerations similar to that of [$LookupMatchingLocaleByBestFit$] in terms of keeping "best fallback languages" together.
383+
: [=language availabilities/output languages=]
384+
:: the result of [=getting language availabilities=] given the purpose of producing text summaries in that language
385+
</dl>
411386
</div>
412387

413-
<div algorithm>
414-
The <dfn>language tag set completeness rules</dfn> state that for every [=set/item=] |languageTag|, if |languageTag| has more than one subtag, then the set must also contain a less narrow language tag with the same language subtag and a strict subset of the same following subtags (i.e., omitting one or more).
415-
416-
<p class="note">This definition is intended to align with that of [=[[AvailableLocales]]=] in <cite>ECMAScript Internationalization API Specification</cite>. [[ECMA-402]]
417-
418-
<div class="example" id="example-subtags-intro">
419-
This means that if an implementation supports summarization of "`de-DE`" input text, it will also count as supporting "`de`" input text.
420-
421-
The converse direction is supported not by the [=language tag set completeness rules=], but instead by the use of [$LookupMatchingLocaleByBestFit$], which ensures that if an implementation supports summarizing "`de`" input text, it also counts as supporting summarization of "`de-CH`", "`de-Latn-CH`", etc.
422-
</div>
423-
424-
<div class="example" id="example-subtags-chinese">
425-
A common setup seen in today's software is to support two types of written Chinese: "traditional Chinese" and "simplified Chinese". Let's suppose that the user agent supports summarizing text written in traditional Chinese with no downloads, and simplified Chinese after a download.
388+
<div class="example" id="example-subtags-chinese">
389+
A common setup seen in today's software is to support two types of written Chinese: "traditional Chinese" and "simplified Chinese". Let's suppose that the user agent supports summarizing text written in traditional Chinese with no downloads, and simplified Chinese after a download.
426390

427-
One way this could be implemented would be for [=summarizer language availabilities=] to return that "`zh-Hant`" is in the [=language availabilities/input languages=]["{{AIAvailability/available}}"] set, and "`zh`" and "`zh-Hans`" are in the [=language availabilities/input languages=]["{{AIAvailability/downloadable}}"] set. This return value conforms to the requirements of the [=language tag set completeness rules=], in ensuring that "`zh`" is present. Per <a class="allow-2119" href="#language-tag-completeness-implementation-defined">the "should"-level guidance</a>, the implementation has determined that "`zh`" belongs in the set of downloadable input languages, with "`zh-Hans`", instead of in the set of available input languages, with "`zh-Hant`".
391+
One way this could be implemented would be for [=summarizer language availabilities=] to return that "`zh-Hant`" is in the [=language availabilities/input languages=]["{{AIAvailability/available}}"] set, and "`zh`" and "`zh-Hans`" are in the [=language availabilities/input languages=]["{{AIAvailability/downloadable}}"] set. This return value conforms to the requirements of the [=language tag set completeness rules=], in ensuring that "`zh`" is present. Per <a class="allow-2119" href="#language-tag-completeness-implementation-defined">the "should"-level guidance</a>, the implementation has determined that "`zh`" belongs in the set of downloadable input languages, with "`zh-Hans`", instead of in the set of available input languages, with "`zh-Hant`".
428392

429-
Combined with the use of [$LookupMatchingLocaleByBestFit$], this means {{AISummarizerFactory/availability()}} will give the following answers:
393+
Combined with the use of [$LookupMatchingLocaleByBestFit$], this means {{AISummarizerFactory/availability()}} will give the following answers:
430394

431-
<xmp class="language-js">
432-
function a(languageTag) {
433-
return ai.summarizer.availability({
434-
expectedInputLanguages: [languageTag]
435-
});
436-
}
395+
<xmp class="language-js">
396+
function a(languageTag) {
397+
return ai.summarizer.availability({
398+
expectedInputLanguages: [languageTag]
399+
});
400+
}
437401

438-
await a("zh") === "downloadable";
439-
await a("zh-Hant") === "available";
440-
await a("zh-Hans") === "downloadable";
402+
await a("zh") === "downloadable";
403+
await a("zh-Hant") === "available";
404+
await a("zh-Hans") === "downloadable";
441405

442-
await a("zh-TW") === "available"; // zh-TW will best-fit to zh-Hant
443-
await a("zh-HK") === "available"; // zh-HK will best-fit to zh-Hant
444-
await a("zh-CN") === "downloadable"; // zh-CN will best-fit to zh-Hans
406+
await a("zh-TW") === "available"; // zh-TW will best-fit to zh-Hant
407+
await a("zh-HK") === "available"; // zh-HK will best-fit to zh-Hant
408+
await a("zh-CN") === "downloadable"; // zh-CN will best-fit to zh-Hans
445409

446-
await a("zh-BR") === "downloadable"; // zh-BR will best-fit to zh
447-
await a("zh-Kana") === "downloadable"; // zh-Kana will best-fit to zh
448-
</xmp>
449-
</div>
410+
await a("zh-BR") === "downloadable"; // zh-BR will best-fit to zh
411+
await a("zh-Kana") === "downloadable"; // zh-Kana will best-fit to zh
412+
</xmp>
450413
</div>
451414

452415
<h3 id="the-aisummarizer-class">The {{AISummarizer}} class</h3>
@@ -1240,6 +1203,48 @@ An <dfn export>error information</dfn> is a [=struct=] used to communicate error
12401203
1. Return [$CanonicalizeUnicodeLocaleId$](|potentialLanguageTag|).
12411204
</div>
12421205

1206+
<div algorithm>
1207+
To <dfn export>get language availabilities</dfn> given a description |purpose| of the purpose for which we're checking language availability:
1208+
1209+
1. Let |availabilities| be «[ "{{AIAvailability/available}}" → an empty [=set=], "{{AIAvailability/downloading}}" → an empty [=set=], "{{AIAvailability/downloadable}}" → an empty [=set=] ]».
1210+
1211+
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent supports |purpose|, without performing any downloading operations:
1212+
1213+
1. [=set/Append=] |languageTag| to |availabilities|["{{AIAvailability/available}}"].
1214+
1215+
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent is currently downloading material (e.g., an AI model or fine-tuning) to support |purpose|:
1216+
1217+
1. [=set/Append=] |languageTag| to |availabilities|["{{AIAvailability/downloading}}"].
1218+
1219+
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent believes it can support |purpose|, but only after performing a not-currently-ongoing download (e.g., of an AI model or fine-tuning):
1220+
1221+
1. [=set/Append=] |languageTag| to |availabilities|["{{AIAvailability/downloadable}}"].
1222+
1223+
1. [=Assert=]: |availabilities|["{{AIAvailability/available}}"], |availabilities|["{{AIAvailability/downloading}}"], and |availabilities|["{{AIAvailability/downloadable}}"] are disjoint.
1224+
1225+
1. If the [=set/union=] of |availabilities|["{{AIAvailability/available}}"], |availabilities|["{{AIAvailability/downloading}}"], and |availabilities|["{{AIAvailability/downloadable}}"] does not meet the [=language tag set completeness rules=], then:
1226+
1227+
1. Let |missingLanguageTags| be the [=set=] of missing language tags necessary to meet the [=language tag set completeness rules=].
1228+
1229+
1. [=set/For each=] |languageTag| of |missingLanguageTags|:
1230+
1231+
1. <span id="language-tag-completeness-implementation-defined"></span> [=set/Append=] |languageTag| to one of the three sets. Which of the sets to append to is [=implementation-defined=], and should be guided by considerations similar to that of [$LookupMatchingLocaleByBestFit$] in terms of keeping "best fallback languages" together.
1232+
1233+
1. Return |availabilities|.
1234+
</div>
1235+
1236+
<div algorithm>
1237+
The <dfn>language tag set completeness rules</dfn> state that for every [=set/item=] |languageTag|, if |languageTag| has more than one subtag, then the set must also contain a less narrow language tag with the same language subtag and a strict subset of the same following subtags (i.e., omitting one or more).
1238+
1239+
<p class="note">This definition is intended to align with that of [=[[AvailableLocales]]=] in <cite>ECMAScript Internationalization API Specification</cite>. [[ECMA-402]]
1240+
1241+
<div class="example" id="example-subtags-intro">
1242+
This means that if an implementation supports summarization of "`de-DE`" input text, it will also count as supporting "`de`" input text.
1243+
1244+
The converse direction is supported not by the [=language tag set completeness rules=], but instead by the use of [$LookupMatchingLocaleByBestFit$], which ensures that if an implementation supports summarizing "`de`" input text, it also counts as supporting summarization of "`de-CH`", "`de-Latn-CH`", etc.
1245+
</div>
1246+
</div>
1247+
12431248
<h3 id="supporting-availability">Availability</h3>
12441249

12451250
<div algorithm>

0 commit comments

Comments
 (0)