Skip to content
This repository was archived by the owner on Dec 4, 2019. It is now read-only.

Source Server Specs

Mark Allen Matney, Jr edited this page Aug 23, 2017 · 16 revisions

OS

The code used to generate ResourceSync documents (rs_oaipmh_src.py) has only been tested on Linux-based operating systems; therefore, one of those is the recommended platform for running that code.

Specs, Qualitatively

Disk

The amount of persistent storage (disk) required depends on:

  • the number of collections you have,
  • the number of items per collection (affects the size of ResourceLists), and
  • the number of resources that are expected to be created/updated/deleted (affects the size of ChangeLists).

Memory

The amount of memory required depends on:

  • whatever minimum that is required by your web server, and
  • the requirements of the py-resourcesync library (for which no official metrics have been made publicly available at this time, unfortunately).

Specs, Quantitatively

To get a feel for some numbers, we've done some tests on a couple sample collections.

Disk

Our first test collection's ResourceList and ChangeList, each with 5000 entries, both came out to no more than 1.5 MiB (so, 300 B per entry). You can estimate your usage a couple of different ways:

A Rough Estimate

If you are able to estimate the average number of resources per collection and the average number of anticipated changes per collection, use this formula to calculate your institution's requirements:

<MIN_NUM_OF_BYTES> = (<NUM_RESOURCES_COLLECTION_AVG> + <NUM_ANTICIPATED_CHANGES_COLLECTION_AVG>) * <NUM_COLLECTIONS> * 300 B

A Closer Estimate

If you have more fine-grained information about your collections, use this formula:

<MIN_NUM_OF_BYTES> =

(

  (<NUM_RESOURCES_COLLECTION_1> + <NUM_ANTICIPATED_CHANGES_COLLECTION_1>) +

  (<NUM_RESOURCES_COLLECTION_2> + <NUM_ANTICIPATED_CHANGES_COLLECTION_2>) +

  ...

  (<NUM_RESOURCES_COLLECTION_N> + <NUM_ANTICIPATED_CHANGES_COLLECTION_N>)

) * 300 B

where:

  • <NUM_RESOURCES_COLLECTION_i> is the number of resources that collectioni has when its ResourceList is generated for the first (and only) time, and
  • <NUM_ANTICIPATED_CHANGES_COLLECTION_i> is the number of total anticipated changes (create/update/delete) for this collection from the time the ResourceList is generated until the end of time.

You can expect the size of ResourceLists and ChangeLists to grow linearly with the number of entries in them.

Memory

We also have profiled an invocation of rs_oaipmh_src.py for ResourceList generation on a collection with 50000 resources, and the memory usage peaked at just under 100 MiB. We expect memory usage to grow linearly with the number of resources, but this has not yet been verified.

Clone this wiki locally