Skip to content

Prepping data

Michelle Janowiecki edited this page Feb 8, 2022 · 8 revisions

This is a workflow for preparing metadata for new items for ingest into Drupal.

Get list of existing terms from Drupal

1. Get existing taxonomy terms from Drupal.

input: none
script: get/getTaxonomyIdentifiers.py
output: levy-api/existing-taxonomies

First, you need to get existing taxonomy terms from Drupal. This is to ensure you don't make duplicates of already existing terms. To do this, run getTaxonomyIdentifiers.py against your production site. This will create a folder in your levy-api directory called existing-taxonomies and will create a CSV for each type of taxonomy in Drupal.

Currently, there are six taxonomies in the Lester Levy Sheet Music Collection:

  • Composition Metadata (composition_metadata.csv)
  • Content List (c)
  • Creator Roles (creator_r.csv)
  • Duplicate Reason Codes (duplicat.csv)
  • Instrumentation Metadata (instrumentation_metadata.csv)
  • Publishers (publishers.csv)
  • Subjects (subjects.csv)

2. Get existing levy_collection_names from Drupal.

input: none
script: get/getNode_levy_collection_names.py
output: allCollectionNames.csv

Next, you need to get existing levy_collection_names (the entity used for creator/contributor names) from Drupal. This is to ensure you don't make duplicates of already existing names. To do this, run getNode_levy_collection_names.py against your production site. This will create a CSV called allCollectionNames.csv containing all existing levy_collection_names in Drupal in your main levy-api directory.

Get list of terms from new data.

3. Get list of taxonomy terms and levy_collection_names from spreadsheet of new data.

input: Spreadsheet of new data
script: explodeTaxonomiesAndNames.py
output: levy-api/aggregated-taxonomies & levy-api/aggregated-roles

Determine what terms need to be created in Drupal.

4. Compare taxonomy terms from new items to existing terms in Drupal.

input: spreadsheets in levy-api/existing-taxonomies & levy-api/aggregated-taxonomies
script: findExistingTaxTermsAndTermsToCreate.py
output: levy-api/items-matched, taxonomyTermsDone.csv, taxonomyTermsToCreate.csv

5. Compare levy_collection_names from new items to existing terms in Drupal.

input: allCollectionNames.csv & levy-api/aggregated-roles
script: findExistingCollNamesAndNamesToCreate.py
output:matched_CollectionNames.csv, levy_collection_namesDone.csv, levy_collection_namesToCreate.csv


Clone this wiki locally