bundle
yarn
./bin/devThe following command will start a local Solr instance at localhost:8983, with a pre-loaded core named blacklight-core.
docker compose upData for the solr and redis services are persisted using docker named volumes. You can see what volumes are currently present with:
docker volume lsIf you want to remove a volume (e.g. to start with a fresh database or solr core), you can run:
docker volume rm stanford-arclight_solr-data # to remove the solr dataYou can load fixture data locally:
rake seedThis command will loop through all the directories under spec/fixtures/ead, for example spec/fixtures/ead/ars and spec/fixtures/ead/uarc, and index all the .xml files present. The names of these subdirectories must correspond with a top-level key in the repositories.yml file. For example, uarc is a top-level key in repositories.yml, as well as the title of a subdirectory under spec/fixtures/ead. A mis-match will cause indexing issues.
The easiest way to load data other than the fixtures is to use the DownloadEadJob and/or the IndexEadJob. See below for instructions about how to use Sidekiq to run these jobs in development. Under most circumstances it's fine to use the default :async adapter to run these jobs without Sidekiq in development.
By default the DownloadEadJob will store EAD files in the directory set in ./config/settings.yml as Settings.data_dir. You can choose a different location by setting the DATA_DIR environment variable, passing data_dir: argument to the job method, or by setting a different location for data_dir in ./config/settings.local.yml
The DownloadEadJob will attempt to use the ASpace API to download EADs. You will need to configure the API URL with username and password in order to connect to ASpace. To do this you will need to add the following to config/settings.local.yml with the correct URL, port, and account information:
aspace:
default:
user: USERNAME
password: PASSWORD
url: "http://ARCHIVESPACE_URL:PORT"
chs:
user: USERNAME
password: PASSWORD
url: "http://ARCHIVESPACE_URL:PORT"
Important Note: ArcLight core includes a number of rake tasks for loading data into Solr, such rake arclight:index, rake arclight:index_dir, rake arclight:index_url, and rake arclight:index_url_batch. Using these rake tasks will use the default Traject indexing rules from ArcLight core only and WILL NOT apply any of the local Traject indexing rules. It's important to use either the local app's IndexEadJob or the Traject command (REPOSITORY_ID={REPO_ID} bundle exec traject -u {SOLR_URL} -i xml -c ./lib/traject/sul_config.rb {FILE_PATH}) to index data that will work correctly with stanford-arclight.
By default in development Rails will run the DownloadEadJob and IndexEadJob jobs with the :async adapter. If you prefer to run these jobs in the background you can use Sidekiq.
- In
config/environments/development.rb, add the line:config.active_job.queue_adapter = :sidekiq - Make sure Redis and Solr are running. The included Docker enviroment will start both Redis and Solr for you.
- Start Sidekiq:
bundle exec sidekiq- Run a job. For example, to download and index all the
ars(Archive of Recorded Sound) collections updated after March 1, 2024, run:
bin/rails runner 'DownloadEadJob.enqueue_one_by(aspace_repository_code: "ars", updated_after: "2024-03-01")'- You can monitor job progress in the Sidekiq admin UI, which is available at:
http://localhost:3000/sidekiq
There is a rake task for deleting a single collection and all of its components from the Solr index.
- Find the Solr document id for the collection (which is a form of the EAD ID)
- Run the rake task:
# Some shells (such as zsh) require that the brackets are escaped.
bundle exec rake stanford_arclight:delete_by_id\['ars0167'\]- Enter YES at the prompt to delete the collection and its components.
Finding aid PDFs can be automatically generated from EAD XML. The following are needed:
- Saxon
- Apache FOP
- Java
Paths to those tools must be configured in ./config/settings.yml.
Settings.pdf_generation.fop_pathto specify the path to the fop executableSettings.pdf_generation.saxon_pathto specify the path to the saxon jar
The path to the referenced fonts must be set in config/pdf_generation/fop-config.xml. They are not bundled in this repository. They can be found in ArchivesSpace.
PDFs can be automatically generated as part of DownloadEadJob by setting Settings.pdf_generation.create_on_ead_download.
The GeneratePdfJob can be used to generate PDFs not created automatically via DownloadEadJob.
For example, the following generates all missing PDFs but does not regenerate existing PDFs:
bin/rails runner 'GeneratePdfJob.enqueue_all'