Skip to content

Other training data: humpback whales

Scott Veirs edited this page Mar 17, 2023 · 13 revisions

The "Haro Humpback" catalog & open annotations

Over the winter of 2021-2022, Emily Vierling worked with Val and Scott Veirs as a Beam Reach "extern" (mostly-remote internship during COVID) to describe humpback signals within the open data from Orcasound Lab hydrophones (Haro Strait, WA, USA). Leveraging her previous training with Helena Symonds and Paul Spong of OrcaLab, listening to humpbacks in Johnstone Strait (BC, Canada), Emily developed a new online Haro Humpback dictionary and annotated thousands of signals in Orcasound open data.

An open, collaborative humpback signal catalogue (catalog)

Presented first to the DCLDE 2022 workshop by Emily in spring, 2022, the catalogue (catalog) contains 12 signals that she found to be most common in recordings made primarily in the late fall (presumably of male humpback whales beginning to vocalize prior to leaving the Salish Sea for tropical wintertime habitat in Hawaii and/or Mexico). In addition to the 2022 version that Emily published via Wordpress, the catalogue is shared via the signal-catalogue Github repo where we hope new versions of code can be maintained to provide a generic tool to the bioacoustic community for building online and offline signal catalogues.

The 12 signal types in version 1.0 of this humpback signal dictionary are:

  1. Whup
  2. Grunt
  3. Ascending Moan
  4. Descending Moan
  5. Moan
  6. Upsweep
  7. Trumpet
  8. Growl
  9. Creak
  10. Buzz
  11. Shriek
  12. Chirp

Labeled data overview & attribution

Emily's annotated data includes ~9,000 labels and is based on ~YY hours of audio data from 3 days during October 03-28, 2021. These labeled data are part of Orcasound's AWS open data registry and are freely available under Orcasound's Creative Commons license (CC BY-NC-SA). Please attribute any use of the dictionary and/or labeled data to: "Emily Vierling, 2022, Orcasound" with a link back to orcasound.net.

Audio and annotation metadata

Audio Data


Annotation Files


  • License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
  • Annotator: Emily Vierling
  • Method (manual or semi-manual): manual
  • Detector (if applicable): N/A
  • Filelist: URI | URL via Quilt
  • Granularity (call, file, encounter): non-song vocalization
  • Resolution (species, ecotype, call type, etc.): species; possibly individual(s) in some cases, depending on sightings data
  • Columns (for each column provide description of content and possible values):
    • Selection: sequential numbering within the annotation file for each labeled signal
    • Begin Time (s): seconds into the recording when annotation bounding box begins _ End Time (s): seconds into the recording when annotation bounding box ends
    • Low Freq (Hz): lower frequency bound of annotation bounding box
    • High Freq (Hz): upper frequency bound of annotation bounding box
    • Call Type: 12 non-song vocalization categories for "Haro Humpbacks" and humpbacks observed by Orca Lab in Johnstone Strait

Pre-processed deep/machine learning data set

We are also sharing the training data (URI | URL via Quilt)that Val developed based on Emily's work. It includes fixed-window audio clips and associated spectrograms. Preliminary documentation of his efforts can be found in the signal-annotation Github repo.

Clone this wiki locally