Skip to content
This repository was archived by the owner on Dec 4, 2019. It is now read-only.

Further Information and Considerations for Content Providers

Mark Allen Matney, Jr edited this page Sep 12, 2017 · 20 revisions

High-level architecture

The software for content providers is structured in the following way:

  • source.py - a single Python script with dependencies, the majority of which are installed via Pip
    • the script may optionally be scheduled to be run via cron, for example
  • source.ini - general configuration options for the script
  • source_logging.ini - logging configuration options

High-level description

source.py does the following, in order:

  • Read config (*.ini) files and process command-line arguments
  • Send HTTP GET requests to OAI-PMH data providers
  • Process responses from data providers
  • Write ResourceSync document files (e.g., SourceDescription, CapabilityList, ResourceList, ChangeList) to the local filesystem at the locations specified by command-line arguments
  • Write log file at the location specified by source_logging.ini, and log to STDOUT

Security considerations

As the document roots of many popular web servers (e.g., Tomcat, httpd) are owned by users with elevated privileges, the simplest use of this software is to run it with the privileges required to write ResourceSync document files directly to their final locations (i.e., under the server document root). The software does not accept incoming HTTP requests or any other kind of user input, and since it resides on the backend can only be invoked by either the system administrator or some scheduling utility such as cron, both of which we can assume are trustworthy.

If the script is unable to be run with elevated privileges, it can be invoked so that the target root directory of the files is not under the document root, as long as the output files are subsequently moved to the correct locations under the document root.

Dependencies

The software mostly depends on standard, widely-used Python modules available on PyPI. The only exception is the primary dependency (py-resourcesync), developed primarily by software developers at LANL. There are numerous tests written for py-resourcesync, which is open sourced here: https://github.com/resourcesync/py-resourcesync.

Target Audience and expected input

The target audience (indented user) of the software is the administrator of a system that is to become a resourcesync hosting server, or the cron scheduling utility. For example usage, see https://github.com/UCLALibrary/resourcesync-oai-pmh/wiki/Use-Case-Recipes; for detailed usage, please download the software, install dependencies, and run python3 source.py --help.

Destination and expected output

The expected output of the software is:

  • INFO-level logging to STDOUT
  • DEBUG-level logging to a file specified in the configuration
  • ResourceSync documents (files) created or updated
Clone this wiki locally