Welcome to the USMAI Primo Link Migrator. This link updater was created to deal with links from services you're migrating away from (such as EDS and the Aleph Catalog) and convert them to Primo Permalinks when possible.
This program does the following with links:
- Extracts IDs that can be used to query Alma, and generate permalinks using SRU
- When this fails, mark the link as needing to be updated manually and puts it in a seperate file
- For Ebsco links, when a permalink cannot be generated, try to modify the link to point at a stable Ebsco resource instead of your old discovery service.
This program is also designed to output easy-to-use files. Here are the available file types it can work on:
- Springshare: Takes a URLs file extracted directly from Springshare, and produces a file that you can give to your Springshare rep to bulk replace
- Webcrawl: Takes the results of a web crawl (if they're in a specific format), generates an output telling where on your site all outdated links are and what they should be replaced with. Web crawler is not included here
- Blackboard: Takes a file extracted from Blackboard of all URLs, and generates a file that can be used to bulk replace links
- Simple URL file: Takes a simple file of URLs and creates a file with the corresponding new URL. Useful for many purposes. Note that all files have a corresponding 'rejects' file containing links that could not be generated automatically for one reason or another.
- Create your configuration: Create a folder under
/configswith a config.json file in it. Seeconfig-example.jsonfor the format of this file. - Fill in Permalink Hash: Generate a permalink on Primo, and pull the hash number from that link. It will look something like:
dg0og1. - Fill in campus subdomain: This is the sub-domain of your campus in Primo (x.primo.exlibrisgroup.com)
- Fill in your campus code: The code that's like O1USMAI_UMCP
- Ignore the next 3 options (generate_aleph_mms_without_sru, mms_campus_code, and mms_middle_num). Keep the boolean to false.
- Select an SRU request limit if you're doing testing with large files and don't want to wait a while for results. This will stop the link generation after this number (per file type). Set to -1 for no request limit.
- Select number of workers: This is the number of concurrent requests that will happen. Too high and Alma might reject your requests; too low and it will take a while.
- Set up Ebsco Campus DB Names: If you're transferring from an Ebsco discovery layer, you'll need to provide the codes that identify the databases that contain your catalogs. These should not be Ebsco-provided databases; rather, wherever your old ILS was loading its bibs/holdings/items into. There could be one or a few.
- Set up additional extractors: Extractors are responsibe for identifying links that need to be converted to Permalinks via Regex, and also provide regex to extract the ID specifically. The existing extractors may cover your ebsco or worldcat needs, as long as you're not using a custom domain. Extractors also tell the program which SRU field in Alma should be used to query for this ID. Please see the existing extractors in
src/extractors.py. - Set up Additional Ebsco Replacements: Some URLs cannot be converted to Permalinks, but can be modified so that they will keep working after you lose access to your Ebsco discovery service. We have identified several modifications required to keep them working, located in
src/ebsco_link_processor.py. You can add your own or override existing ones in theadditional_ebsco_replacementsarray. - Add the files you want to process into your folder in
/configs. The springshare file should be calledassets_list.csv, the blackboard fileblackboard.csv, the webcrawl filewebcrawl.json, and simple url filesimple_url.csv. It will run on whichever of these it finds.
- Activate a python virtual environment
- Install the requirements with
pip3 install -r requirements.txt - Run the program with
python -m handle [name_of_your_config_folder]
Congrats! The output will be saved to a corresponding folder name in the /output directory, under a folder with the timestamp of the run.
Yeah, there's a lot of files. Just ignore the json files unless you want to do some debugging (they break things down further per failure category.) For each of the input files, there will be two corresponding output excel/csv files. The 'success' files are links ready to be updated. The 'rejects' files are links that SHOULD be updated, but cannot be for some reason.