Skip to content

Procedures

Nuno Macedo edited this page Jun 30, 2017 · 40 revisions

Synchronization Procedures

This section specifies the synchronization procedures that allow a CRIS service to keep a user’s profile consistent with the one at ORCID.

Overview

The procedures can be used to synchronize the profiles according to two different modes: one for importing information from the remote ORCID profile and another for exporting information from the local profile. These can be used independently depending on the needs of the interested service.

Import

This mode aims to harvest new research outputs from ORCID, namely new publications and new external identifiers of known publications. The general principle is that every external identifier in an ORCID profile should be harvested. The synchronization procedure supporting this mode is semi-automatic, based on a notification system, allowing the user to select which outputs or UIDs he wishes to add to his PTCRIS profile. The IMPORT procedure does not change the ORCID user profile, managing only the notifications of the input CRIS profile. This semi-automatic approach provides the user with valuable information while still allowing him to control the updates that are effectively applied to the profile. The option for a notification based semi-automatic approach is due to the fact that the ORCID user profile may contain erroneous information (for example, erroneous meta-data), and, as such, we avoid propagating such error to the CRIS profile, giving the opportunity for the user to clean-up his ORCID profile beforehand (for example, deleting incorrect works or creating new versions with corrected meta-data).

Export

This mode is targeted for CRIS services that intend to be ORCID sources and export their productions to ORCID, ensuring that other CRIS services can harvest them. The general principle is that every production selected to be exported in the CRIS profile should be inserted as a new work in the ORCID profile and then automatically kept up-to-date. The EXPORT procedure does not change the input CRIS user profile p and manages only works on the ORCID profile whose source is the CRIS service. The ORCID profile is updated through its API, given the ORCID iD stored in p.

Sync

These modes are supported by separate synchronization procedures, export for the former and import works and updates for the latter. A CRIS service may choose to implement only one of the modes (e.g., RCAAP is only concerned with exporting outputs, while the SARIs are concerned with harvesting outputs) or the conjunction of both (e.g., the CV management system DeGóis). In the latter case, EXPORT must be executed prior to IMPORT, since running the EXPORT procedure may change the grouping of works.

Group merging

The main conceptual difference between ORCID that typical CRIS services is that ORCID automatically groups productions that are considered the same into a single group. Two productions are considered the same if they share an external identifier, and this relation is transitive. In the ORCID web interface, the user is able to select which work of the group is preferred, which is the one that will be publicly displayed in the profile.

To synchronize CRIS profiles with ORCID profiles, ORCID work groups must be merged into single productions. Due to the central role of the external identifiers in ORCID and PTCRISync, the merging of a group performs as follows:

  • collect every external identifier from every work that comprises the group
  • collect the remainder meta-data from the work of the group selected as preferred by the user

Quality criteria

Every PTCRISync procedure relies on a quality criteria over the productions that are to be synchronized, including for the remote ORCID works that are to be imported and for the local CRIS productions that are to be exported.

To promote the performance of the procedures, this criteria are defined solely over the work summaries returned by the ORCID API, and not over the full works (which would require additional calls to the API).

To pass the quality criteria, a work must have:

  • at least one external identifier assigned
  • the title
  • the work type
  • the publication year (unless the work is a data set or research technique)

Scheduling

(wip)

Each service is free to choose when to run the synchronization procedures, as long as inconsistencies in the profiles are eventually resolved within a reasonable delay. The export procedures should also be executed prior to the import procedures, in order to guarantee the consistency of the import results.

In general, the import procedures need only be run when the user is managing the list of synchronized works, since these notifications are volatile and need not be persisted. The lighter import counter should be run otherwise.

The export procedure, in contrast, needs to be run to keep the ORCID profile synchronized with the list of local productions selected to be actively synchronized. Ideally it should be run when this list is updated, when any work of that list is modified, or when the ORCID profile is updated (in particular, if works sourced by the local service were deleted).

One possible choice would be to run the procedures periodically in the specified order (export, then import) in batch mode, thus avoiding possible delays that can negatively affect the user interface. Premium ORCID members could also trigger the synchronization based on Webhooks Change Notifications from ORCID, by registering to be notified when a user profile changes.

Another sensible choice would be to run the import procedures at the begin of a user session and the export procedure at the end. This ensures that the visible parts of the profiles are consistent when the user is logged out, but that whenever he logs in again the correct notifications are shown. We believe that invoking the synchronization procedures every time the user performs an edit within a session may be counterproductive, as new notifications might keep popping-in and confuse the user. Similarly to distributed systems, the goal of the synchronization framework is to ensure eventual consistency and not necessarily real-time strong consistency among all services.

The scheduling employed in the reference implementation is the following:

  • The import works procedure is run whenever the user opens the synchronization page or updates the list of works being synchronized. The notification provided by import works are volatile, and need only be generated when the user is inspecting the synchronization pane. The same applies to the import invalids procedure. (The import counter can be run before since it is used to warn the user in its homepage.)
  • The import updates procedure is run in background in a periodic basis. This procedure does not require input from the user, and simply harvests meta-data from the ORCID profile and automatically updates the local productions.
  • The export works must be run before any of the import procedures is run in order for them to produce consistent results. Thus, the procedure is run in background periodically (prior to the import updates) and whenever the synchronization menu is accessed/updates (prior to the import works/invalids).

Correctness

(wip)

The consistency ensured by both modes is precisely stated in the companion formal specification (with a precise set of constraints that instantiates the above general principles), and the synchronization procedures were designed to satisfy several "well-behavedness" properties concerning such consistency. The most important of those is correctness, namely ensuring that after running the synchronization procedures the user profiles in ORCID and in the PTCRIS service are consistent according to the specification. Another important "well-behavedness" law is stability, ensuring that if the synchronization procedures are run on already consistent profiles the result is the same (modulo differences in the internal identifiers).

Having stable synchronization procedures ensures that there is no need to explicitly check the consistency to determine whether they should be run, since running the synchronizers will not affect them. In fact, the checking procedures have the same approximate complexity as the synchronizers, and thus, no significant performance gains would be achieved by running them beforehand. It could even cause a performance degradation if the user profiles happen to be inconsistent.

Clone this wiki locally