Skip to content

Aggregating crk dictionary sources #131

@aarppe

Description

@aarppe

This contains some general notes and observations concerning the aggregation of the three (or more) Plains Cree dictionary resources.

A. For creating LEXC source, the dictionaries minimally need the following:

  1. Entry head that has been standardized according to SRO.
  2. Stem that has been standardized according to SRO.
  3. Lexical (inflectional) category following the classification scheme in CW.

B. For mapping the dictionary sources against each other, we need minimally the following:

  1. Entry head that has been standardized according to SRO.
  2. Lexical category following the classification scheme in CW.

A-B. Additional head-aches in regularizing the sources:

  1. CW uses the accented-y in the entry heads (and stems), while MD and AECD have not. However, for comparison purposes the accented-y can be treated as a regular-y. Nevertheless, we would want to revise the pertinent fields in MD and AECD to have the accented-y, where appropriate (head and stem).

C. Additional headaches concerning the English definitions that we would probably want to regularize:

  1. The sources use different conventions for indicating subject (CW: s/he vs. AECD: S/he vs. MD: He) and object (CW: s.o., s.t. vs. AECD: him/her, it vs. MD: him, it).
  2. The sources use different conventions in separating senses (CW: semicolon, AECD: semicolon, MD: numbering).
  3. The definitions contain Cree words and passages, that should be marked as such.

D. Varying sets of fields under entries that we may want to fill in:

  1. Besides stem and lexical category, CW has a number of fields that AECD and MD do not, e.g. the morphological decomposition.
  2. AECD has variants and alternatives listed with a different convention than CW, while MD has practically none.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    aggregationChanges to the aggregation algorithmmetaIssues for tracking issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions