This is a repository for the temporal worker docker image for EPP.
This project facilitates asynchronous importing of content identified from a docmap provider. We are using the docmaps to provide a feed of preprints that have been reviewed by a particular publisher. The data in the docmap provides the history and location of content, which we parse and retrieve.
We then push the parsed content into an EPP server endpoint.
Finally, the results of all this retrieval is stored in an S3 bucket in well structured paths (which can then be configured as a source for a canteloupe IIIF server)
The monitoring and scheduling of the import workflows are handled by a temporal server testing and dev).
Ensure you have docker and docker-compose (v2 tested). Also install temporal to start and control jobs
- clone the repo
- run
yarn - run
docker compose upto start temporal and the workers in "watch" mode - run
temporal operator namespace listto list namespaces, you should see default namespace listed, and not any other error.
The docker compose workflow above will restart the worker when your mounted filesystem changes.
To run an import workflow, run:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index" }'This will kick of a full import for a docmap index from eLife's API.
To re-run the whole process, you will first need to remove the containers and volumes:
docker compose down --volumesTo prevent large reimport of docmaps that would cause content becoming unpublished, you can specify an optional numeric threshold for docmap changes that are allowed.
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "docMapThreshold": 2 }'This can also be applied to the importDocmap, importManuscriptData and importContent workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "xsltTransformPassthrough": true } }'Sometimes we want to disable specific types of XSLT, e.g. handle-etal-in-refs.xsl (full list of options can be found in xsltLogs in Temporal). This can also be applied to the importDocmap, importManuscriptData and importContent workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "xsltBlacklist": "file1.xsl, file2.xsl" } }'This can also be applied to the importDocmap, importManuscriptData and importContent workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "preferPreprintContent": true } }'This option deletes all existing versions of a manuscript before importing, which can be useful for completely refreshing content instead of just updating it. This can be applied to the importDocmap, importManuscriptData and importContent workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "purgeBeforeImport": true } }'temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "encodaDefaultVersion": "1.0.12" } }'Sometimes, due to issues with Temporal UI, we need to use command line to send a signal. You need to specify the target workflow id, name and input of the signal.
tctl workflow signal --workflow_id import-docmap-test --name approval -i trueTo run an import workflow that only imports docmaps that are new or have changed since a previous run, start an importDocmaps workflow with a state file name as the second parameter and add a state file to minio:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "s3StateFileUrl": "state.json" }'This will read in previously seen (and hashed) docmaps from the S3 bucket in config, skipping any it has seen before.
To kick of a full import for a docmap index from eLife's API, then loop itself every hour (see next command to change this), skipping docmaps that have no changes.
To change the sleep time, add a semantic time parameter to the --interval inputs, for example 1 minute or 5 minutes:
temporal schedule create --schedule-id import-docmaps -w import-docmaps -t epp --workflow-type importDocmaps -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "s3StateFileUrl": "import-docmaps.json" }' --overlap-policy Skip --interval '1m'You can then view these runs on the dashboard.
SERVER_DIR="../your-directory-here" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localserver.yaml upTo start the application with a local version of the EPP API server, so you can run the application and test local changes of the API, you need to define an environment variable SERVER_DIR with the location of your EPP API server project, i.e. SERVER_DIR="../enhanced-preprints-server", then run the above command to invoke the .localserver overrides. This will work with the first import workflow command.
To run with the local API but without the mocked services, omit -f docker-compose.override.yaml from the compose command.
SERVER_DIR="../enhanced-preprints-server" APP_DIR="../enhanced-preprints-client" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localserver.yaml -f docker-compose.localapp.yaml upENCODA_DIR="../enhanced-preprints-encoda" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localencoda.yaml upNOTE: this will only read meca files from the real S3, so you don't need to mock them out
Define a .env file with these variables:
MECA_AWS_ACCESS_KEY_ID=your access key
MECA_AWS_SECRET_ACCESS_KEY=your secret key
MECA_AWS_ROLE_ARN=a role to assume to have permission to source S3 buckets # optionalThen run docker-compose with the base, override and s3 configs, like below:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3.yaml upTo import a specific docmap such as 85111 use the importDocmap workflow:
temporal workflow execute --type importDocmap -w import-docmap-85111 -t epp -i '{ "url": "https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v2/by-publisher/elife/get-by-manuscript-id?manuscript_id=85111" }'NOTE: this will only write extract resources to the real S3, so you can verify that the process works
Define a .env file with these variables:
AWS_ACCESS_KEY_ID=your access key
AWS_SECRET_ACCESS_KEY=your secret key
BUCKET_NAME=you will want to create an S3 bucket for your dev experimentsThen run docker-compose with the base, override and s3 configs, like below:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3-epp.yaml upYou can combine the s3 source and destination to allow for retrieval from s3 source and preparing the assets and uploading them to S3:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3.yaml -f docker-compose.s3-epp.yaml upBefore starting check with the production team a time to perform a full reimport.
Make sure you're using the appropriate AWS profile:
export AWS_DEFAULT_PROFILE=elifeCreate example-state.json file with an empty array:
echo "[]" > example-state.jsonUpload state file to S3:
aws s3 cp ./example-state.json s3://prod-elife-epp-data/automation/state/example-state.jsonCheck the number of docmaps to import:
curl --no-progress-meter https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v2/index | jq ".docmaps | length"Populate input data with:
{
"docMapIndexUrl": "https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v2/index",
"end": 1000,
"s3StateFileUrl": "example-state.json"
}Click "Start Workflow"!
Visit the workflow just created.
Wait until mergeDocmapState appears in the Event History of the workflow's page in Temporal.
Check the number of items recorded in the state file. If this is less than the number of Docmaps create a new importDocmaps workflow:
aws s3 cp s3://prod-elife-epp-data/automation/state/example-state.json - | jq ". | length"Once all importDocmaps workflows have been successfully created, monitor the progress of the import, visit:
Time for a strong coffee and a croissant! ☕ 🥐
importContentimports a version of an article as specified in the docmap file.importDocmapreads a docmap file and imports all versions of the article defined within that docmap file.importManuscriptDataaccepts the parsed docmap as input and imports all versions of the article defined.importDocmapsreads a docmap index and triggers aimportDocmapworkflow for each item in the index by default. If the docmap content is already known, a docmap's import may be skipped, as controlled by an optional s3 state file.
To find the name of the state file, in the temporal workflow input look for "s3StateFileUrl": "example-docmap-elife-index.json" in the configuration object.
To output contents of the state file in AWS cli:
aws s3 cp s3://prod-elife-epp-data/automation/state/example-docmap-elife-index.jsonTo count the items in the state file use:
aws s3 cp s3://prod-elife-epp-data/automation/state/example-docmap-elife-index.json - | jq ". | length"To monitor the count of items in the state file:
watch -n 1 'aws s3 cp s3://prod-elife-epp-data/automation/state/example-docmap-elife-index.json - | jq ". | length"'To run the tests with docker (especially useful if they are not working locally) use the following command:
docker compose -f docker-compose.tests.yaml run tests