Skip to content

Processing Author Page Requests

weissenh edited this page Nov 10, 2025 · 5 revisions

Processing Author Page Requests

Authors find their papers under a so-called author page in the format https://aclanthology.org/people/slug, e.g. https://aclanthology.org/people/matt-post

Users can open issues by clicking the "Fix author" button on this site to report problems.

Usually these requests are:

  • (a) to merge two author pages into one (e.g. one with and without middle name)
  • (b) to split/disambiguate an author page that comprises several authors of the same name
  • (c) to change the name displayed in the heading to their preferred variant (e.g. adopted a different name after marriage)

Background Information

As of fall 2025, authors are represented as described in Name Variants, based on string matching and recording variants (e.g. additional middle name) of the same name under a human-readable ID like matt-post. It is planned to switch to a system relying mainly on ORCIDs for matching (cf. Author Page Plan).

Since users would like only their papers to be listed on this page and to have exactly one page per author (including papers with all name variants), an explicit representation of authors is needed.

Users open issues if the author page doesn't match their expectations. A GitHub issue template is found in the respective folder as 02-name-correction.yml providing a form for users to describe their problem in a fixed format.

Also, users might open "author page" issues to report other kinds of errors (e.g. metadata corrections) that should be handled via a different process.

See also: https://aclanthology.org/faq/ and more elaborate https://aclanthology.org/info/corrections/

Steps

  1. Assign yourself to the issue if you haven't already
  2. Initial checks
    • Very basic check that this is not a malicious request: close obvious spam, anything obviously malicious
    • Check if the problem is really about the author page or something else (e.g. metadata correction)
      • Some users use the form to report paper-level problems that will resolve if they press "Fix data" on the relevant paper website. If so, request the user to submit such corrections and close the issue
      • Example:

      Hi, you can use the "Fix Data" button at <link-to-paper-page> to correct it since the PDF does not match the author metadata. After the correction has been approved and uploaded by us, the correct name should appear! For more information see https://aclanthology.org/info/corrections/

    • Check if the form has been filled out with all the relevant data. If not ping the user and let them know what's missing or unclear.
      • Example:

      Hi, can you please provide your preferred name / ORCID / highest degree institution (e.g. where obtained or are about to obtain PhD, might be different from current affiliation) / list of all the papers that are yours as ACL anthology links (providing Google Scholar profile etc. or all papers that are not yours won't do) ? You can edit the original post to do so.

  3. Branch Create a branch firstname-lastname-affiliation (often mirroring the later author page suffix)
  4. Name Variants file Add or update the relevant entry/entries in name_variants.yaml (cf. Name Variants wiki page) This can include:
    • canonical name: will become the heading. Users might mention a preferred name. If not, check most recently used names.
    • id in the form slug or slug-institution, e.g. jane-doe or jane-doe-mit
    • orcid: just the number, not the full URL
    • institution: typically where obtained PhD (this is not necessarily the same as the current affiliation!)
    • comment: optional field, content can be automatically added(?)
      • No need to add anything if the name is not ambiguous
      • Can be same as institution or a short version thereof: beware of to not incentivize authors to request changes when they change affiliation.
      • Right now "May refer to several people" is used for a generic catch-all author page, but this will change with the new author setup (cf. Author Page Plan)
    • if applicable (e.g. when merging author pages): list name variants
    • if applicable (rare): authors with same (canonical) names will show up automatically under "People with similar names", but e.g. if there are persons with not-exactly identical name, decide whether they should be listed under similar
  5. Change IDs on paper level
    • As long as matching is still based on name variants, need to add ids for both the general catch-all and the singled-out author.
    • You can partially automate this using
      • script found in a PR add_author_id_by_year.py: run like python add_author_by_year.py abhinav-gupta-mila Abhinav Gupta 2020 2019 (if all papers (co)authored by Abhinav Gupta in 2020 and 2019 should belong to abhinav-gupta-mila)
      • bin/add_author_id.py
    • If you do it via (regex) search and replace, please make sure to
      • not include files under tests, but just data/xml,
      • not overwrite attribute orcid (if present in XML): while iterating through matches, it helps to double-check that the new id assigned is consistent with data in the name variants file.
  6. Check changes
    • look at how many files/lines/XML tags were edited
    • (if PR opened, see next point)
      • automatic tests run by GitHub should complete successfully
      • if PR from inside acl-anthology (instead of a fork) a preview will be generated, so you can view how the author page(s) will look like afterwards: do they contain the expected number of papers, relevant comment on institution, mention people with similar names etc.
  7. Create a pull request
    • assign yourself
    • link to the issue
    • add to the appropriate Projects and Milestones
  8. Add reviewer once ready and write down necessary information:
    • if needed: share any striking observations or problems
    • you could mention relevant preview links to look at or summarize your changes in 1-2 sentences

Notes on general procedure

  • Bulk or one by one PR: unlike metadata corrections and revisions processed in batches, these are often processed one by one on an irregular basis
  • Just single a person out, no need to disambiguate a name fully: only single out the person who asked to be singled out, if there are multiple other persons remaining in catch-all, it is up to them to ask.

Reviewing PRs on Author Page Requests

  • check that the number of tags are matching
  • open up a few papers randomly, the first and last paper to make sure that the right papers are in the right place.

There is a preview which makes verifying things very easy. Previews are only generated if the PR does not come from a fork and automatic checks pass.

See also

Clone this wiki locally