USMAI Primo Link Migrator

Welcome to the USMAI Primo Link Migrator. This link updater was created to deal with links from services you're migrating away from (such as EDS and the Aleph Catalog) and convert them to Primo Permalinks when possible.

This program does the following with links:

Extracts IDs that can be used to query Alma, and generate permalinks using SRU
When this fails, mark the link as needing to be updated manually and puts it in a seperate file
For Ebsco links, when a permalink cannot be generated, try to modify the link to point at a stable Ebsco resource instead of your old discovery service.

This program is also designed to output easy-to-use files. Here are the available file types it can work on:

Springshare: Takes a URLs file extracted directly from Springshare, and produces a file that you can give to your Springshare rep to bulk replace
Webcrawl: Takes the results of a web crawl (if they're in a specific format), generates an output telling where on your site all outdated links are and what they should be replaced with. Web crawler is not included here
Blackboard: Takes a file extracted from Blackboard of all URLs, and generates a file that can be used to bulk replace links
Simple URL file: Takes a simple file of URLs and creates a file with the corresponding new URL. Useful for many purposes. Note that all files have a corresponding 'rejects' file containing links that could not be generated automatically for one reason or another.

Configuring this program

Create your configuration: Create a folder under /configs with a config.json file in it. See config-example.json for the format of this file.
Fill in Permalink Hash: Generate a permalink on Primo, and pull the hash number from that link. It will look something like: dg0og1.
Fill in campus subdomain: This is the sub-domain of your campus in Primo (x.primo.exlibrisgroup.com)
Fill in your campus code: The code that's like O1USMAI_UMCP
Ignore the next 3 options (generate_aleph_mms_without_sru, mms_campus_code, and mms_middle_num). Keep the boolean to false.
Select an SRU request limit if you're doing testing with large files and don't want to wait a while for results. This will stop the link generation after this number (per file type). Set to -1 for no request limit.
Select number of workers: This is the number of concurrent requests that will happen. Too high and Alma might reject your requests; too low and it will take a while.
Set up Ebsco Campus DB Names: If you're transferring from an Ebsco discovery layer, you'll need to provide the codes that identify the databases that contain your catalogs. These should not be Ebsco-provided databases; rather, wherever your old ILS was loading its bibs/holdings/items into. There could be one or a few.
Set up additional extractors: Extractors are responsibe for identifying links that need to be converted to Permalinks via Regex, and also provide regex to extract the ID specifically. The existing extractors may cover your ebsco or worldcat needs, as long as you're not using a custom domain. Extractors also tell the program which SRU field in Alma should be used to query for this ID. Please see the existing extractors in src/extractors.py.
Set up Additional Ebsco Replacements: Some URLs cannot be converted to Permalinks, but can be modified so that they will keep working after you lose access to your Ebsco discovery service. We have identified several modifications required to keep them working, located in src/ebsco_link_processor.py. You can add your own or override existing ones in the additional_ebsco_replacements array.
Add the files you want to process into your folder in /configs. The springshare file should be called assets_list.csv, the blackboard file blackboard.csv, the webcrawl file webcrawl.json, and simple url file simple_url.csv. It will run on whichever of these it finds.

Running the program

Activate a python virtual environment
Install the requirements with pip3 install -r requirements.txt
Run the program with python -m handle [name_of_your_config_folder]

Congrats! The output will be saved to a corresponding folder name in the /output directory, under a folder with the timestamp of the run.

Reading the output

Yeah, there's a lot of files. Just ignore the json files unless you want to do some debugging (they break things down further per failure category.) For each of the input files, there will be two corresponding output excel/csv files. The 'success' files are links ready to be updated. The 'rejects' files are links that SHOULD be updated, but cannot be for some reason.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
config-example.json		config-example.json
handle.py		handle.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USMAI Primo Link Migrator

Configuring this program

Running the program

Reading the output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

USMAI Primo Link Migrator

Configuring this program

Running the program

Reading the output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages