Skip to content

Prepping data

Michelle Janowiecki edited this page Jan 14, 2022 · 8 revisions

Workflow for preparing metadata for new items for ingest into Drupal.

Get list of existing terms from Drupal

  1. Get existing taxonomy terms from Drupal.

  2. Get existing levy_collection_names from Drupal.

Get list of terms from new data.

  1. Get list of taxonomy terms and levy_collection_names from spreadsheet of new data.
    • input:
      • spreadsheet of new data
    • script: explodeTaxonomiesAndNames.py
    • outputs:
      • levy-api/aggregated-taxonomies (new items aggregated by taxonomy name)
      • levy-api/aggregated-roles (new items aggregated by levy_collection_names and grouped by role)

Determine what terms need to be created in Drupal.

  1. Compare taxonomy terms from new items to existing terms in Drupal.

    • inputs:
      • spreadsheets in levy-api/existing-taxonomies
      • spreadsheets in levy-api/aggregated-taxonomies
    • script: findExistingTaxTermsAndTermsToCreate.py
    • outputs:
      • levy-api/items-matched (items aggregated by taxonomy terms with Drupal identifiers added, if found)
      • taxonomyTermsDone.csv (list of taxonomy terms that already exist in Drupal)
      • taxonomyTermsToCreate.csv (list of taxonomy terms that DO NOT exist in Drupal and need to be created)
  2. Compare levy_collection_names from new items to existing terms in Drupal.

    • inputs:
      • allCollectionNames.csv (spreadsheet containing all existing levy_collection_names in Drupal)
      • levy-api/aggregated-roles (spreadsheets of levy_collection_names grouped by role and aggregated by title)
    • script: findExistingCollNamesAndNamesToCreate.py
    • output:
      • matched_CollectionNames.csv (items aggregated by levy_collection_names with Drupal identifiers added, if found)
      • levy_collection_namesDone.csv (list of levy_collection_names that already exist in Drupal)
      • levy_collection_namesToCreate.csv (list of levy_collection_names that DO NOT exist in Drupal and need to be created)

Clone this wiki locally