-
Notifications
You must be signed in to change notification settings - Fork 365
XML Format
Much of the structure of the XML format is specified by the RELAX NG schema (data/xml/schema.{rnc,rng}) and can be validated automatically. This document describes the structure less formally and also describes aspects of the format that aren't specified by the schema.
The root element is <volume id="X99">, where X is replaced by the one-letter code for the venue and 99 is replaced by the last two digits of the year.
The <volume> element has child elements <paper id="9999">, where 9999 is replaced by the four-digit paper identifier. For some venues (LREC), there is also an href attribute for the external URL of the paper.
Each <paper> element has several child elements:
-
<title>: The title (see below for more details) -
<author>: The authors (see below for more details) -
<editor>: The editors (see below for more details) - and others.
Text fields (<title>, <author>, etc.) are written in Unicode (UTF-8). The following elements are currently allowed for formatting:
-
<tex-math>: math formulas, coded using TeX (equivalent to TeX$...$). For example:An <tex-math>O(n^3)</tex-math> Algorithm for Parsing Context-Free Grammars. -
<url>: a URL, displayed in typewriter font and hyperlinked -
<i>: italics -
<b>: boldface
Below are additional guidelines for specific fields.
The title should be written in title-case. The Anthology doesn't currently have rules for what "title-case" means exactly, but individual meetings/journals might. Characters whose case should be preserved even when a bibliography style uppercases or lowercases the title should be placed inside a <fixed-case> element (this serves the same purpose as curly braces in BibTeX). For example:
<title>The <fixed-case>ACL</fixed-case> <fixed-case>A</fixed-case>nthology: Current State and Future Directions</title>
Each author/editor name must have exactly one <last> element and at most one <first> element.
-
The
<last>element contains the name(s) by which papers are cited and their bibliography entries are sorted alphabetically. If an author has only a single name, that name should go into the<last>element. A "lineage" likeJr.orIIIshould go into the<last>element. -
The
<first>element contains all other names, including middle names/initials.
Ideally, the name should appear in the XML the same way that it does on the original paper. For example, if the original paper has only a first initial and last name, like A. Joshi, the XML should also have only a first initial and last name: <first>A.</first> <last>Joshi</last>. If you know the full first name, please use the complete attribute to record it: <first complete="Aravind">A.</first> <last>Joshi</last>. Similarly for middle and last initials.
Paper PDFs are linked in three ways.
-
<url>URL</url>: URL of Anthology-hosted PDF. -
<paper href="URL">...</paper>: URL of externally-hosted, non-ACL-sponsored PDF (currently used mainly for LREC) -
<href>URL</href>: URL of externally-hosted, ACL-sponsored PDF (currently used mainly for TACL)
Other files can be linked as well:
<software>filename</software><dataset>filename</dataset>-
<attachment type="...">filename</attachment>where the type is 'note', 'presentation', 'poster', 'attachment', or missing -
<mrf src="latexml">filename.xhtml</mrf>(machine readable format? Mr. F?) <video href="URL" tag="video"/><revision id="2">Q15-1022v2</revision><erratum id="1">Q15-1022e1</erratum>