Skip to content

User stories for collation #15

@hsivonen

Description

@hsivonen

From reading CLDR collation sources, I observe that:

  • Some non-default collations seem to exist in CLDR only as a matter aversion of ever unshipping anything. big5han and gb2312han collations don't seem to address use cases but are there only because ICU started with Chinese collations that were analogous in construction with the Japanese collation. The compat collation for Arabic is explicitly for compatibility with earlier software behavior and not framed as culturally traditional. The traditional collations for Finnish and Swedish are framed as culturally traditional, but as a Finnish-native user, I have a really hard time believing in there being significant user demand for these, so even these have an air of being there as a matter of unshipping aversion.
  • Some non-default collations are usage-dependent. phonebk for German is like this. It's a bit unclear to me if dict for Sinhala is for dictionary usage or if a user might want to use dictionary-style sort generally. AFAICT, the unihan collations are mainly relevant for building a browsable index for a dictionary.
  • Search collations are not only usage-dependent but outright inappropriate for sorting usage.

This leaves relatively few non-default collations that may be plausible as user-settable preferences:

  • zhuyin for sorting Traditional Chinese according to Bopomofo notation of Mandarin pronunciation as opposed to sorting by stroke count. (AFAICT, this is relevant to TW but not really to HK or MO, since sorting my Mandarin doesn't make sense for Cantonese and Bopomofo is TW-specific in practice.)
  • Possibly stroke sorting in the Simplified Chinese context when the spoken form isn't Mandarin. (I'm theorizing here and am not at all sure if there's significant user demand.)
  • Possibly phonetic sorting for Lingala (I don't know anything about this)
  • Possibly traditional sorting for Spanish, Vietnamese, Bangla, and Kannada. (I imagine there might actually be user demand for traditional sorting for Spanish. I don't know anything about the situation for Vietnamese, Bangla, or Kannada)

It would be useful to assess the user demand for these especially in contrast to privacy issues. It might well be that the result of this kind of assessment will be that privacy issues considered, it doesn't make sense to expose non-default collation variants in general to the Web but that it makes sense to only expose zhuyin as a bifurcation of zh-TW.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions