-
Notifications
You must be signed in to change notification settings - Fork 1
Source Server Specs
The code used to generate ResourceSync documents (rs_oaipmh_src.py) has only been tested on Linux-based operating systems; therefore, one of those is the recommended platform for running that code.
The amount of persistent storage (disk) required depends on:
- the number of collections you have,
- the number of items per collection (affects the size of ResourceLists), and
- the number of resources that are expected to be created/updated/deleted (affects the size of ChangeLists).
The amount of memory required depends on:
- whatever minimum that is required by your web server, and
- the requirements of the
py-resourcesynclibrary (for which no official metrics have been made publicly available at this time, unfortunately).
To get a feel for some numbers, we've done some tests on a couple sample collections.
Our first test collection's ResourceList and ChangeList, each with 5000 entries, both came out to no more than 1.5 MiB (so, 300 B per entry). You can estimate your usage a couple of different ways:
If you are able to estimate the average number of resources per collection and the average number of anticipated changes per collection, use this formula to calculate your institution's requirements:
<MIN_NUM_OF_BYTES> = (<NUM_RESOURCES_COLLECTION_AVG> + <NUM_ANTICIPATED_CHANGES_COLLECTION_AVG>) * <NUM_COLLECTIONS> * 300 B
If you have more fine-grained information about your collections, use this formula:
<MIN_NUM_OF_BYTES> =
(
(<NUM_RESOURCES_COLLECTION_1> + <NUM_ANTICIPATED_CHANGES_COLLECTION_1>) +
(<NUM_RESOURCES_COLLECTION_2> + <NUM_ANTICIPATED_CHANGES_COLLECTION_2>) +
...
(<NUM_RESOURCES_COLLECTION_N> + <NUM_ANTICIPATED_CHANGES_COLLECTION_N>)
) * 300 B
where:
- <NUM_RESOURCES_COLLECTION_i> is the number of resources that collectioni has when its ResourceList is generated for the first (and only) time, and
- <NUM_ANTICIPATED_CHANGES_COLLECTION_i> is the number of total anticipated changes (create/update/delete) for this collection from the time the ResourceList is generated until the end of time.
You can expect the size of ResourceLists and ChangeLists to grow linearly with the number of entries in them.
We also have profiled an invocation of rs_oaipmh_src.py for ResourceList generation on a collection with 50000 resources, and the memory usage peaked at just under 100 MiB. We expect memory usage to grow linearly with the number of resources, but this has not yet been verified.