Skip to content

Other training data: humpback whales

Scott Veirs edited this page Sep 23, 2022 · 13 revisions

The "Haro Humpback" catalog & open annotations

Over the winter of 2021-2022, Emily Vierling worked with Val and Scott Veirs as a Beam Reach "extern" (mostly-remote internship during COVID) to describe humpback signals within the open data from Orcasound Lab hydrophones (Haro Strait, WA, USA). Leveraging her previous training with Helena Symonds and Paul Spong of OrcaLab, listening to humpbacks in Johnstone Strait (BC, Canada), Emily developed a new online Haro Humpback dictionary and annotated thousands of signals in Orcasound open data.

An open, collaborative humpback signal catalogue (catalog)

Presented first to the DCLDE 2022 workshop by Emily in spring, 2022, the catalogue (catalog) contains 12 signals that she found to be most common in recordings made primarily in the late fall (presumably of male humpback whales beginning to vocalize prior to leaving the Salish Sea for tropical wintertime habitat in Hawaii and/or Mexico). In addition to the 2022 version that Emily published via Wordpress, the catalogue is shared via the signal-catalogue Github repo where we hope new versions of code can be maintained to provide a generic tool to the bioacoustic community for building online and offline signal catalogues.

The 12 signal types in version 1.0 of this humpback signal dictionary are:

  1. Whup
  2. Grunt
  3. Ascending Moan
  4. Descending Moan
  5. Moan
  6. Upsweep
  7. Trumpet
  8. Growl
  9. Creak
  10. Buzz
  11. Shriek
  12. Chirp

Labeled data overview & attribution

Emily's annotated data includes ~9,000 labels and is based on ~YY hours of audio data from 20ZZ-2021. These labeled data are part of Orcasound's AWS open data registry and are freely available under Orcasound's Creative Commons license (CC BY-NC-SA). Please attribute any use of the dictionary and/or labeled data to: "Emily Vierling, 2022, Orcasound" with a link back to orcasound.net.

Audio and annotation metadata

Audio Data


  • License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
  • Data owner / source: Orcasound
  • Location: Acoustic Sandbox (S3 bucket, part of the AWS open data registry)
  • No. files: 7
  • File length: 12-180 MB
  • Time range: 03 Oct 2021 - 28 Oct 2021
  • Dataset size: 993 MB
  • Description: Version 1 of Haro Humpback bioacoustic bouts for annotation by Emily Vierling (winter-spring 2022)
  • Coordinates:
  • Water depth: 8 meters
  • Format: FLAC
  • Codec: FLAC
  • Channels: 1
  • Sample Rate: 44.1 kHz

Annotation Files


  • License/data sharing agreement
  • Annotator: Emily Vierling
  • Method (manual or semi-manual): manual
  • Detector (if applicable): N/A
  • Filelist
  • Granularity (call, file, encounter)
  • Resolution (species, ecotype, call type, etc.): species, possibly individual(s) in some cases, depending on sightings data
  • Columns (for each column provide description of content and possible values)

Pre-processed deep/machine learning data set

We are also sharing the training data that Val developed based on Emily's work. It includes fixed-window audio clips and associated spectrograms. Preliminary documentation of his efforts can be found in the signal-annotation Github repo.

Clone this wiki locally