Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section 9.1 AvailableLocales shouldn't require base language if script is present #947

Open
sffc opened this issue Dec 10, 2024 · 2 comments
Labels
c: meta Component: intl-wide issues s: discuss Status: TG2 must discuss to move forward

Comments

@sffc
Copy link
Contributor

sffc commented Dec 10, 2024

It seems like it should be allowed for AvailableLocales to support zh-Hant but not zh, since zh implies zh-Hans. However, the spec currently states:

Additionally, for each element with more than one subtag, it must also include a less narrow language tag with the same language subtag and a strict subset of the same following subtags (i.e., omitting one or more) to serve as a potential fallback from ResolveLocale.

@sffc sffc added c: meta Component: intl-wide issues s: discuss Status: TG2 must discuss to move forward labels Dec 10, 2024
@sffc sffc moved this to Priority Issues in ECMA-402 Meeting Topics Dec 10, 2024
@sffc sffc moved this from Priority Issues to Previously Discussed in ECMA-402 Meeting Topics Dec 19, 2024
@gibson042
Copy link
Contributor

since zh implies zh-Hans

Is there any spec text supporting this claim? I would consider it perfectly reasonable for an implementation that has data for "zh-Hant" but not "zh-Hans" to use the former in service of requested locale "zh", and any application that specifically warrants "zh-Hans" should be specific.

On the other hand, it would seem bizarre and in violation of the spirit (if not also the letter) of BCP 47 to support narrow data in absence of covering broad data. Some excerpts:

  • «A language tag is composed from a sequence of one or more "subtags", each of which refines or narrows the range of language identified by the overall tag»
  • «In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located. Single letter or digit subtags (including both the letter 'x', which introduces private-use sequences, and the subtags that introduce extensions) are removed at the same time as their closest trailing subtag.»
  • «For example, a user who reads both Simplified and Traditional Chinese, but who prefers Simplified, might use the range "zh" for filtering (matching all items that user can read) but "zh-Hans" for lookup (making sure that user gets the preferred form if it's available, but the fallback to "zh" will still work)»
  • «Whether a subtag adds distinguishing value can depend on the context of the request… If the user cannot be sure which scheme is being used (or if more than one might be applied to a given request), the user SHOULD specify the most specific (largest number of subtags) range first and then supply shorter prefixes later in the list to ensure that filtering returns a complete set of tags.»

I don't think that's invalided by Unicode likelySubtags logic, which improves "best case" results but should not preempt such "worst case" scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: meta Component: intl-wide issues s: discuss Status: TG2 must discuss to move forward
Projects
Status: Previously Discussed
Development

No branches or pull requests

2 participants