Skip to content

Nomination Normalizing Metadata

mlohnash edited this page Dec 16, 2020 · 3 revisions

##Checking/normalizing metadata during the Nomination Period

Each stream manager will check formatting and normalize metadata from the point of submission to the ingest deadline (30 days after the nomination period closes). This occurs for items submitted by spreadsheet, and for items submitted directly in the Repository.

This review focuses mainly on the required fields. Map metadata to our Repository templates for still image/text and AV import. Do not relabel or rearrange the fields. While working on the file it is best to save it as .xlsx to retain formatting. Do not save as .csv as it will lose any formatting or highlighting used during review.

Send all newspaper nominations to Theresa to normalize, evaluate sources, assess copyright, etc. She'll map these particular considerations as additional technical metadata or notes.

Do not do a lot of pre-processing. Bounce it back to the Partner if there are a lot of fixes or questions that would disrupt the award process. For example dates are consistently not formatted as EDTF, incomplete copyright statements, or improperly formatted person names for controlled vocabulary fields. If the errors are not too glaring or disruptive, we'll ask the partner in the award letter to fix, or clarify, the records in the Repository before sending items to CA-R. Keep in mind this is an opportunity to teach partners best description practices for a digitization project, and some partners, particularly new partners, may need more help with our metadata guidelines than others.

Managers should respond to partners' questions depending on the stream. If a partner submits more than one stream, or a mixed collection, managers should try to coordinate questions as one email correspondence to minimize the number of emails going to a partner and reduce confusion.

Update/add values in these fields before ingest:

  • Project Note: "California Revealed"
  • Media Type: Depends on format. Image, Text, Moving Image or Audio
  • Country of creation: See controlled vocabulary
  • Production Stream: AV, DG, NP, OS, or PT
    • AV is the default Production Stream for the AV Content Type
    • OS is the default Production Stream for the Still Image or Text Content Type
  • Grant Cycle: Current round. See controlled vocabulary
  • Price Bundle: Based on gauge/format. See controlled vocabulary
  • Special Handling: Based on condition. See controlled vocabulary
    • All items being sent to BSLW have "Tier 1" applied to account for annual price changes

Check that all required fields are complete. Unknown is an acceptable placeholder. Required fields are bold below.

Normalization includes:

  • Name of institution provided matches the name of the institution in the Repository.
    • If there's a discrepancy, email partner to clarify their preference; default should be the institution name provided in the application.
    • Institution name must match in order for the records to properly upload to Islandora.
  • Call number or Temporary ID present
  • Titles are differentiated in some way -- by additional title, date or sequential number (e.g. Sacramento Home Movie #1, Sacramento Home Movie #2, etc.).
    • Titles display online as a list and it's easier for users to search unique titles.
    • Try to move dates or other description out of the Title field into other fields as needed, except for newspapers. All newspaper titles must be formatted with serial title and issue date, i.e., Richmond Record Herald 1941-12-07.
  • Names of Creators, Contributors, and Subject Entities are formatted Last Name, First Name and follow the Library of Congress Name Authority File (LCNAF) format. This also applies to group entities as well, such as families or corporations.
  • Dates are formatted YYYY-MM-DD. Use "Unknown" as a placeholder.
    • Created Date is the required date field. This follows the Library of Congress Extended Date and Time Format.
    • Be sure to format all date columns as text so Excel does not auto change the format.
    • Publication Date field must be numeric characters only, as this is a controlled field. This field is required for serial publications.
  • Ensure that copyright statements are complete and generally comply with our Permission Guidelines.
    • "Copyrighted" statement should include the name of the copyright holder.
    • Copyright status unknown statement must include an institutional/evergreen email address. Follow up with partner if needed.
  • Gauge/Format conforms to our controlled vocabularies. Check the vocabulary lists (AV List; Print List) in the Repository.
  • Extent number of parts is formatted, e.g. 1 Page of 1; 2 Reels of 2; 3 Tapes of 3. Do not ingest print materials without number of pages or dimensions.
    • Determine if related parts can, or should, be recorded as a complex object, if appropriate, using Title as a clue. Confirm with partner, and if there's a lot to consolidate, give the work to them.
  • Dimensions are formatted as "in." for inches and "cm." for centimeters. Use fractions for inches and decimals for centimeters.
    • If unknown, leave blank and ask partner. Do not ingest print materials without number of pages or dimensions.
  • Duration is formatted as HH:MM:DD. If unknown, leave blank.

In general, please do the following as much as possible:

  • Reduce acronyms and abbreviations. Spell out words and institutions.
  • Batch replace ampersands "&" for "and"
  • Batch replace smart quotes " with straight quotes. To find and replace in excel:
    • find opening smart quotes: alt + [
    • find closing smart quotes alt + shift + [
  • Verify that controlled vocabulary terms are standardized for subject topics and spatial coverage according to LCSH.
    • For extremely local terms that are not in LCSH, format them following LCSH conventions.
  • Use brackets to designate supplied/temporary description. Replace question marks with brackets.
  • Before importing into the Repository, ensure that spreadsheets do not have blank rows as this will create blank records (with no title or institution) in the Repository.
  • If using an export to re-ingest records (for items submitted via the Repository), ensure that all fields are present and accounted for in the import sheet -- empty or left out columns will be overwritten as blank.

After normalizing and mapping, save the file as a "CSV UTF-8 (Comma Delimited) (.csv)" . The file name should follow our standard file naming convention of MARC_GrantCycle_Stream_filetype_date.extension.

  • For example: car_2018-2019_PT_NominationsImport_2019-11-04.xlsx.

Then upload the spreadsheet to the cycle specific "nominations" folder in the Project Tracking folder in SharePoint.

Clone this wiki locally