Skip to content

Processor input (JSON)

inukshuk edited this page Jun 9, 2012 · 8 revisions

Naming Conventions

In order to come up with definitive naming conventions for CSL Processor input, here is a (work in progress) collection of inconsistencies found in the current JSON format. Since the majority of attributes use a minus to separate terms, this list includes strictly camel-cased names.

Citation Data

  • citationItems
  • noteIndex (properties)
  • ciationID
  • sortedItems
  • sortkeys (not sort-keys)

Items

The valid items seem to distinguish between minus and underscore in that the minus seems to be used to distinguish between subtypes and the underscore is used to connect terms. For example, it is motion_picture' but article-journal'. This approach is consistent by itself, but since the most other attributes use a minus as a connector I find this confusing; is the distinction really necessary, or could we use a minus everywhere? An alternative would be to use `/' for sub-typing.

  • shortTitle
  • journalAbbreviations
  • archive_location / archive-place

Aside from Bruce: keep in mind the origin of the data schema, which is effectively a direct mapping of CSL terms. The intention of the term is indeed close to the assumption above: hyphens where to indicate subtyping (article-magazine), but also relations (container-title). By contrast, multiple word get separated by an underscore (motion_picture). I think that still makes sense for the original scope (CSL proper), but may need to be reconsidered for the data input format.

Names

Both non-dropping-particle is really long; wouldn't it be easier to have particle and dropping-particle and demote-particle?

  • isInstitution

Bibliography Input

To filter items to be included in the bibliography, a list of conditions can be passed in an array. As far as I can tell, the order of the individual elements in the array is of no concern; each element has exactly two properties: 'field' and 'value'. Instead of an array of hashes, it seems to me, a single hash would suffice. For example

var myarg = {
  "select" : [
    {
      "field" : "type",
      "value" : "book"
    },
    {
      "field" : "categories",
      "value" : "1990s"
    }
  ]
}

Could then be written as:

var myarg = {
  "select" : {
    "type" : "book",
    "categories" : "1990s"
  }
}

Also, in the Ruby implementation we're mapping select, include, exclude, quash to all, any, none, and skip, respectively, as the first three directly correspond to list filters in Ruby. Since the citeproc-js manual describes the meaning of select, include, and exclude using exactly these terms (all, any, none), I wonder if they would not be the more intuitive choice in the first place.

Bibliography Output

  • bibliography_errors (the contents too, but perhaps these can remain implementation depependent?)

These are just suggestions:

  • line-spacing instead of linespacing
  • entry-spacing instead of entryspacing
  • indent or hanging-indent instead of hangingindent
  • offset or max-offset instead of maxoffset
  • preamble or before instead of bibstart
  • postamble or after instead of bibend (I know postamble is probably not proper English outside of Computer Science, but GNU Make uses it, for example)

Citation Output

  • citation_errors (same as above)

Dates

Date variables, as currently defined, can be either single dates or date ranges; this leads to unnecessarily complex implementations. Wouldn't it be cleaner to distinguish between dates and ranges in the first place? In the same way, open ranges could be defined more explicitly: right now an open range is defined by adding date parts containing zeroes (which are otherwise invalid date values) to the date parts array.

Instead of specifying individual date and date range types, an easy solution would be to pick a subset of EDTF as date input. This way, date variables would be written as strings thus simplifying the processor input (but would require for the processor to parse the EDTF strings).

Clone this wiki locally